You are on page 1of 208

Microchips

a simple introduction
Second Edition
Sitaramarao S. Yechuri, Ph.D.
ISBN 0-9741037-1-3
Library of Congress Control Number: 2004093110
Printed June, 2004
Copyright 2004 by Yechuri Software, Arlington, TX.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system,
or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording,
or otherwise, without prior written permission of the publisher. Printed in the United States of
America.
Introduction
Integrated circuit chips made of Silicon have undeniably transformed our world and the biggest
change happened in just 35 years. The rst Germanium (Ge) transistor was built in Bell Labora-
tories in 1947 by Walter Brattain and John Bardeen. The rst integrated circuit was built at Texas
Instruments in 1958 by Jack Kilby. Nowadays all micro-chips are made of Silicon (Si).
No one can predict the future and mankind has developed many technologies that zzled
out or just never became very popular. Some technologies developed slower than others. Two
hugely important industries of our times, namely the automobile industry and the semiconductor
industry have displayed different behavior.
Automobile technology started in 1889 and is still evolving slowly. It is an industry with a
huge inertia that limits how quickly it can evolve. It is capital intensive and labor intensive and
nowadays the prot margin is not that high due to robust competition. And in some ways it has
not changed that much.
The efciency of todays cars are not more than double that of the cars of the 1930s and our
cars today still use gasoline and still use an overhead cam-shaft to regulate the valves. Cars have
become lighter, but people still drive a 2000 lb car to transport a single 150 lb person. Top speeds
for the cars of the 1940s was easily a 100 mph, and even today cars are built to run at no more
than 100 mph and practical speeds on the roads do not exceed 70 mph.
Semiconductors on the other hand grew very rapidly in the 1990s and matured very quickly
indeed. The metric used is the length of the gate of the transistors used. In 1965 Gordon Moore
of Intel corporation predicted that the transistor density would basically double every year and
so far it has been quite accurate. In fact it is almost a business prediction as much as a technology
prediction in that consumers have grown to expect the new computers to become faster every few
months and they still expect to pay only as much as they did before. In fact it is common for
consumers to put off purchasing electronics until just before they need it because they believe that
tomorrow everything will be a little cheaper and faster.
Another important feature of the micro-chip industry is that in a sense it did accelerate its own
growth. What I mean is this. Up to 1970 most technology was developed on pen and paper. It was
analytical. It is a very, very sad fact that analytical techniques are virtually unused today except by
very few technical people. The pocket calculator was the rst step in increasing the speed of chip
design because it allowed chip designers to calculate transistor sizes and bias points to several
decimal places of accuracy instantly. The early TI programmable calculators had a slot through
which you passed a magnetic strip of paper containing instructions you had coded previously and
they were read in and the calculator was ready to perform a sequence of calculations rather than
just one.
Chip designers were initially circuit designers and did all their design work with pencil and
paper and a calculator. At that time most chips were analog in function. But by the 1990s com-
puters started to take on the weight of chip design and chip designers became little more than
programmers. By then most chips were digital in nature. And this process started to feed on itself
iii
i.e., the improvement in computer speed allowed better and faster chip design software which in
turn allowed better and faster chips to be designed and so on.
In a sense the chip shrink process was on a glide-path because the minimum feature size of the
micro-chips was dictated by the wavelength of light used to dene them and chip manufacturing
equipment manufacturers just used lower and lower wavelengths to dene the features and it
seemed the juggernaut would never stop.
But the juggernaut is slowing down because the wavelength of the light needed to dene the
features has become so small that the energy of the photons (which is inversely proportional to
the wavelength) is now that of an X-ray. At such a high energy there are few photon sensitive
materials which can respond to it.
Another factor which is causing a problem is the gate oxide thickness which has already been
reduced to no more than ve layers of atoms. Besides these two factors the FETs made at very
small dimensions are not delivering the behavior needed to properly design integrated circuits.
At the time of this writing 0.09 is the cutting-edge of the semiconductor processes world-
wide and it is this authors opinion that the 0.06 generation which we will attain by 2006 or a
subsequent 0.05 generation may well be a stable point at which the industry starts to become
commoditized and prices are driven to the minimum and when applications become the main
focus. This happened with the automobile industry and it will probably happen with the chip
industry.
Keep in mind that even well known industry experts dont agree on how much further silicon
based chips can be shrunk and many of these experts have a vested interest in persuading the
public that newer, faster and cheaper technologies are just around the corner and that you should
invest your money in the leading semiconductor companies even at high P/E ratios. When you
read about newer technologies, the key question you should ask is not whether they are feasible
but whether they can be made cheaper than existing technology.
My belief is that chips with many layers of circuitry stacked one on top of the other offers the
key to higher density. To make this a reality I believe that techniques that are additive like epitaxy
or chemical vapor deposition need to become much more cost effective, which could happen if the
volume of usage were increased. They also can be done at lower temperatures which is necessary
to keep the lowest circuit levels functional and nally there needs to be a way to sandwich passive
heat sinks between the layers to suck the heat out because otherwise the middle levels will burn
up.
iv
Contents
1 Passive circuits 1
1.1 The three passive lumped elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Resistance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 Capacitance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.3 Inductance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.4 Impedance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Basic circuit laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 Ohms law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.2 Kirchoffs laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.3 Y transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.4 Mesh equations and node equations . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.5 Thevenin and Norton equivalents . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.6 Maximum power transfer theorem . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.7 Transient analysis using Laplace transforms . . . . . . . . . . . . . . . . . . . 6
2 Active devices - historical 7
2.1 Vacuum technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Diode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Triode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Klystron tube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.5 Read diode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.6 Gunn diode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Semiconductor theory 13
3.1 Wave particle duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 Schroedingers time independent wave equation . . . . . . . . . . . . . . . . . . . . 14
3.3 Quantum well . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.4 Free electron theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.5 Bloch theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.6 Kronig Penney model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.7 Effective mass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.8 Fermi-Dirac distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.9 Poissons equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.10 Drift and diffusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.11 Haynes-Schockley experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.12 Continuity equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.13 Band diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.14 Impurities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
v
4 Active devices 23
4.1 P-N Junction diode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2 Bipolar junction transistor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3 Heterojunction Bipolar Transistor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.4 Field-Effect transistor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.5 FET small signal equivalent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.6 Other transistors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.7 Shrink problems at 0.06 and below . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.7.1 The premise of the shrink . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.7.2 V
th
(Threshold voltage) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.7.3 S
t
(Sub-threshold swing) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.7.4 T
ox
(Gate oxide thickness) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.7.5 L
diff
(Sub-diffusion) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.7.6 Gate loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.7.7 X
j
(Junction depth) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.7.8 N
D
(Drain doping level) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.7.9 T
j
(Junction temperature) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.7.10 Thermal budget . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.7.11 Heat generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5 Process characterization 41
5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.2 Test equipment used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.3 Test circuit layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.4 Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.4.1 Drain characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.4.2 Gate characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.4.3 Back bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.4.4 Collector characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.4.5 Diode characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.4.6 Reverse characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.4.7 S parameter measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.4.8 C-V measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.4.9 Thermal behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.5 Production monitors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.6 Scanning Electron Microscopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.7 Striped wafers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.8 Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.9 Process skew . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.10 Burn in testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.11 Ion implant to create connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.12 Thermal imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6 Chip fabrication 56
6.1 Wafer preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6.2 Lithography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.3 Mask generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.4 Oxide growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
vi
6.5 Doping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.6 Implantation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.7 Etching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.7.1 Wet etch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.7.2 Reactive ion etch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.7.3 Reactive ion beam etch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.8 Sputtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.9 Polysilicon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.10 Sintering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.11 Thermal budget constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
7 Logic circuits 65
7.1 Boolean logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
7.2 Flip-ops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7.3 The pass gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7.4 Karnaugh maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
7.5 Finite state machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
7.6 Domino logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
8 Analog circuits 77
8.1 Current mirror . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
8.2 Current sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
8.3 Active load . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
8.4 Level shifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
8.5 Common emitter/source amplier . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
8.6 DC gain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
8.7 Emitter/Source follower input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
8.8 Bootstrapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
8.9 Millers theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
8.10 Gain bandwidth product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
8.11 Voltage reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
8.12 Differential circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
8.13 Transistor matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
8.14 Bode plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
8.15 Rouths stability criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
8.16 Nyquist path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
8.17 Sample and Hold circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
8.18 Analog to digital conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
8.19 Digital to analog conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
8.20 Low power circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
8.21 Laser trimming and other techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
9 Microprocessors 99
9.1 Binary number system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
9.1.1 Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
9.1.2 Floating point numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
9.2 p block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
9.3 Arithmetic logic unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
9.3.1 Addition and subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
vii
9.3.2 Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
9.3.3 Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
9.4 Shift register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
9.5 Instructions and operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
9.6 CISC and RISC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
9.7 The critical path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
9.8 Pipelining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
9.9 Intentional clock skewing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
9.10 Clock trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
10 Phase-Locked Loops 112
10.1 Ring oscillator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
10.2 Subsystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
10.2.1 Voltage controlled oscillator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
10.2.2 Divider . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
10.2.3 Phase comparator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
10.2.4 Loop lter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
10.3 Loop operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
10.4 Delay-locked loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
10.5 Tracking and re-sync PLLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
11 Digital Signal Processors 123
11.1 Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
11.2 Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
11.3 Digital Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
11.4 Pattern recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
11.5 Error correcting codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
11.5.1 Reed-Solomon code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
11.5.2 Convolutional coding and Viterbi decoding . . . . . . . . . . . . . . . . . . . 132
11.6 Motor control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
12 I/O circuits and pcb interactions 136
12.1 Design consideration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
12.1.1 Capacitive loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
12.1.2 Transit time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
12.1.3 Line impedance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
12.1.4 Electro-static discharge or ESD . . . . . . . . . . . . . . . . . . . . . . . . . . 141
12.1.5 Line drivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
12.1.6 Line terminations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
12.1.7 Impedance variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
12.1.8 Cross coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
12.1.9 Antenna effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
12.1.10 Ground bounce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
12.1.11 Ringing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
12.2 Spread spectrum technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
12.3 Input/Output or IO circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
viii
13 Automatic Test Equipment 150
13.1 DUT board . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
13.2 Main computer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
13.3 Tester boards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
13.4 Pin driver boards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
13.4.1 Timing generation chip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
13.4.2 Pin driver chip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
14 MMICs 157
14.1 Lumped and distributed elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
14.2 Maxwells equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
14.3 Transmission lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
14.4 N-port circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
14.4.1 h-Parameters, y-Parameters & z-Parameters . . . . . . . . . . . . . . . . . . . 162
14.4.2 S-Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
14.5 Balun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
14.6 Circulators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
14.7 Impedance transformers and lters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
15 Transducers 168
15.1 Direct gap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
15.2 Semiconductor lasers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
15.2.1 Edge emitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
15.2.2 Surface emitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
15.2.3 Bulk vs. distributed gain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
15.3 Junction detectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
15.4 Accelerometers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
16 Technology CAD 174
16.1 Basic numerical techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
16.1.1 Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
16.1.2 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
16.1.3 Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
16.2 Grid selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
16.3 Device simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
16.4 Fabrication process simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
16.5 Monte Carlo analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
17 Power electronics 183
17.1 Alternating current . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
17.2 Transformers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
17.3 Rectication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
17.4 DC to AC conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
17.5 DC to DC conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
17.6 Silicon Controlled Rectier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
17.7 Power BJTs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
ix
x
Chapter 1
Passive circuits
Electricity is the ow of electronic charge i.e. the movement of electrons.
1.1 The three passive lumped elements
There are only three basic elements namely resistance, capacitance and inductance. They are
linear.
1.1.1 Resistance
Any material that conducts electricity exhibits a resistance. Resistance essentially inhibits the
ow of electrons and the energy dissipated is converted into heat. Resistors obey Ohms law
V = IR (1.1)
So if you apply a voltage with an arbitrary amplitude variation versus time, the current ow
has the same relative variation versus time. Two resistances a and b in series result in a resistance
of a +b. Two resistances a and b in parallel result in a resistance of
ab
a+b
. The unit of resistance is .
1.1.2 Capacitance
Capacitance is a way to store energy in an electric ux. Any two conductors of any shape
which are not touching each other form a capacitor. For the case of two parallel plates of the same
area A spaced d apart in a medium of dielectric coefcient the capacitance C is given by C =
A
d
.
The way that the energy is stored is that electrons accumulate on one plate and an equal number
of electrons are missing on the other plate.
Two capacitances a and b in series result in a capacitance of
ab
a+b
. Two capacitances a and
b in parallel result in a capacitance of a + b. The unit of capacitance is the Farad. If needed
Coulombs law can be used to obtain the force that acts on the two plates of a capacitor. The
electrical relationship we need to use is that the voltage across the plates of the capacitor is given
by
V (t) =
1
C

I dt (1.2)
Because the voltage across a capacitor is the integral of the current, a sinusoidal input current
results in a sinusoidal voltage which lags the input current by a quarter cycle or 90
o
.
1
1.1.3 Inductance
Inductance is a way to store energy in a magnetic ux. Any conductor is an inductor. The
Biot-Savart law gives the magnetic eld strength due to current ow through a conductor. Now
if the conductor is surrounded by material of a high permeability a magnetic ux ows through
that material and energy is stored.
Two inductances a and b in series result in a inductance of a + b. Two inductances a and
b in parallel result in a inductance of
ab
a+b
. The unit of inductance is the Henry. The electrical
relationship we need to use is that the current owing through the inductor is given by
I(t) =
1
L

V dt (1.3)
Because the current through an inductor is the integral of the voltage across it, a sinusoidal
voltage results in a sinusoidal current which lags the applied voltage by a quarter cycle or 90
o
.
1.1.4 Impedance
By the use of the Fourier transform any section of a time domain waveform can be decom-
posed into its frequency domain constituents. Because inductors and capacitors perform an in-
tegral over time of voltage and current respectively, they will respond differently to sinusoidal
waveforms of the same amplitude but different frequencies.
For this reason passive circuits containing inductors and capacitors are analyzed as a function
of the angular frequency = 2 f where f is the frequency in Hz of the constituents of the time
domain waveforms.
Impedance is the complex measurement of any combination of the three basic elements. Impedance
is denoted as Z(). The impedance of a resistance is simply the resistance and is independent of
frequency. The impedance offered by a capacitor is j/C. The impedance of an inductor is jL.
So if you have a series combination of resistance, capacitance and inductance the impedance is
Z() = R
j
C
+ jL (1.4)
You can manipulate impedances similarly to resistances. So two impedances in series just
result in a new impedance of
Z
ser
() = Z
1
() + Z
2
() (1.5)
Two impedances in parallel result in a new impedance of
Z
par
() =
Z
1
() Z
2
()
Z
1
() + Z
2
()
(1.6)
1.2 Basic circuit laws
1.2.1 Ohms law
Ohms law applies to impedances just as it applies to resistances. Since the impedance is a
function of frequency, so is the current ow and the voltage dropped across the load impedance.
So as before
V () = I() Z() (1.7)
2
1.2.2 Kirchoffs laws
Kirchoff proposed a current lawand a voltage law. Kirchoffs current lawstates that the sumof
all currents into each node in a circuit must be zero as shown in the gure 1.1. Kirchoffs voltage
law states that the voltages summed around a loop must equal zero as shown in the gure 1.2.
I2
I4
I3
I1
Figure 1.1: Kirchoffs current law: I1 +I2 +I3 +I4 = 0.
V1
V2
V3
V4
+
+

+
Figure 1.2: Using Kirchoffs voltage law: V 1 +V 2 +V 3 +V 4 = 0.
1.2.3 Y transformation
R1
R3 R2
Ra Rb
Rc
Figure 1.3: Y transformation.
In the gure 1.3 you can convert from Y to and from to Y using the equations 1.8, 1.9,
1.10, 1.11, 1.12 and 1.13.
R
1
=
R
a
R
b
R
a
+R
b
+R
c
(1.8)
3
R
2
=
R
a
R
c
R
a
+R
b
+R
c
(1.9)
R
3
=
R
b
R
c
R
a
+R
b
+R
c
(1.10)
R
a
=
R
1
R
2
+ R
2
R
3
+ R
1
R
3
R
3
(1.11)
R
b
=
R
1
R
2
+ R
2
R
3
+ R
1
R
3
R
2
(1.12)
R
c
=
R
1
R
2
+ R
2
R
3
+ R
1
R
3
R
1
(1.13)
1.2.4 Mesh equations and node equations
R6 R5
R2
R4
R1
1
2
V
R3
3
4
R7
I3
I1 I2
I4
a

+
Figure 1.4: Mesh circuit to analyze.
If we want to solve the circuit in the gure 1.4. The mesh equations are obtained by imple-
menting KVL in each loop. So we obtain four equations for the four loops as shown below. There
are four unknowns I
1
, I
2
, I
3
and I
4
and they can be obtained by solving the four equations using
Cramers rule.
I
1
R
7
+ (I
1
I
3
)R
6
+ (I
1
I
2
)R
2
= 0 (1.14)
I
3
R
1
+ (I
3
I
1
)R
6
+ (I
3
I
4
)R
3
= 0 (1.15)
I
2
R
4
+ (I
2
I
1
)R
2
+ (I
2
I
4
)R
5
= 0 (1.16)
V
a
+ (I
4
I
2
)R
5
+ (I
4
I
3
)R
3
= 0 (1.17)
The node equations on the other hand are obtained by using KCL and Ohms Law at the
different nodes of the circuit. In the gure 1.4 the nodes are identied by small circles.
V
1
V
2
R
6
+
V
3
V
2
R
2
+
V
4
V
2
R
5

V
2
R
3
= 0 (1.18)
V
3
V
1
R
7
+
V
2
V
1
R
6

V
1
R
1
= 0 (1.19)
4
V
1
V
3
R
7
+
V
4
V
3
R
4
+
V
2
V
3
R
2
= 0 (1.20)
V
4
= V
a
(1.21)
1.2.5 Thevenin and Norton equivalents
Thevenins theorem and the Nortons theorem are explained in the same context. Thevenins
theorem states that after selecting two nodes of a linear circuit the whole circuit can be simplied
into a single voltage source and a series impedance. This is explained as shown in the gure 1.5.
R1
R2
R3
L1
L2
C1
C2
V1

+
Figure 1.5: The circuit to reduce.
In the circuit in the gure 1.5 the circuit behavior as seen by the element with the circles on
either end can be simplied into either of the two circuits shown in the gure 1.6. The one on the
left is called the Thevenins equivalent while the one on the right is called the Nortons equivalent.
Zth
Zth
Vth
In

+
Figure 1.6: Thevenins and Nortons equivalent circuits.
Vth is obtained by measuring the voltage between the two circles with the element removed.
Zth is then obtained by further removing the voltage source and replacing it with a short, so that
the impedance measured between the circles is the Zth. To obtain the Nortons equivalent the
circles are connected with a short and the current through the short is measured. This current is
the value of the current source in the Nortons equivalent. The shunt impedance in the Nortons
equivalent is the same as the Zth.
1.2.6 Maximum power transfer theorem
If you take any two nodes in a circuit and you designate one side of the circuit as the source
and the other side as the load, then the maximum power you can transfer from the source to the
load occurs when the impedance looking into the load is the complex conjugate of the impedance
looking into the source. You can get the source impedance and the load impedance by the use of
the Nortons or Thevenins equivalent circuit.
5
1.2.7 Transient analysis using Laplace transforms
The Laplace transform is somewhat similar to the Fourier transform and is given by
f(s) =


0
e
st
f(t) dt (1.22)
The Laplace transform is really useful in transient analysis because of the way that it handles
integrals and differentials.
f

(t)

sf(s) f(t)[
t = 0
(1.23)

t
0
f(t)

f(s)
s
(1.24)
I
+

R
L
+
C
+

Figure 1.7: Circuit to analyze.


To illustrate let us analyze the circuit in the gure 1.7. In this circuit both the inductor and the
capacitor have initial states and then the circuit is allowed to settle. The equation that we need is
V
C
(0) +
1
C

t
0
Idt +L
dI
dt
+IR = 0 (1.25)
Applying the Laplace transform with I(t) as the variable then gives
I(s) =
Ls I(t)[
t = 0
V
C
(t)[
t = 0
Ls
2
+Rs +
1
C
(1.26)
Now you nd the roots of the denominator and rewrite the expression in the following form
I(s) =
s +c
(s a)(s b)
(1.27)
The inverse Laplace transform then gives the actual current ow as a function of time
I(t) =
(a +c) e
at
(b +c) e
bt
a b
(1.28)
Even though the solution seems to have only exponentials remember that due to Eulers equa-
tion, if either a or b is complex, the current will have sinusoidal variations.
6
Chapter 2
Active devices - historical
2.1 Vacuum technology
In 1947 the rst Germanium transistor was built in Bell laboratories by Walter Brattain and
John Bardeen. The rst integrated circuit was built at Texas Instruments in 1958 by Jack Kilby. But
for a lot more than the rst half of the twentieth century all commercially available electronics was
based on vacuum tubes. Rectication was achieved with the two terminal diode, amplication
was achieved with the three terminal triode. At very high microwave frequencies, the triode is
not fast enough, so the Klystron tube was used for amplication. The Read diode was proposed
in 1958 and it too is used at microwave frequencies and in 1964 the Gunn diode was proposed,
also for microwave amplication. All these devices played an important role in the growth of
electronics.
2.2 Diode
Glass tube
Cathode
Anode
Heater
Figure 2.1: A vacuum diode.
A vacuum diode is an evacuated glass tube as shown in the gure 2.1. The heater at the
bottom heats the cathode red hot or hotter. The cathode is made of a very good conductor with a
low work function such as copper. Due to the heat electrons are boiled off the cathode. The anode
is placed above the cathode and due to the electric eld, the electrons from the cathode drift to
the anode where they combine with the anode, transferring a charge of q from the cathode to the
7
anode. The movement of the electrons in the eld is governed by the Lorentz equation, where the
magnetic eld term is zero.
Rectication occurs because the anode is not heated, so is unable to emit electrons, so if the
polarity of the diode is reversed, current will not ow. The speed of the device depends on the
transit time from the cathode to the anode.
2.3 Triode
Glass tube
Cathode
Anode
Heater
Grid
Figure 2.2: A vacuum triode.
Amplication is achieved when a small current modulation controls a much larger current per-
haps at the same voltage so that the modulation of the larger current is identical to the modulating
signal, perhaps with a constant time delay.
The vacuum diode is converted into an amplifying device by the introduction of the grid be-
tween the anode and the cathode as shown in the gure 2.2. To achieve the most control the grid
must be placed much closer to the cathode than to the anode, because the electric eld induced
by the grid to cathode voltage is competing with the electric eld due to the anode to cathode
voltage.
The grid has to be designed so that it has a high porosity on the one hand and can exert
a controlling eld on the other. The high porosity is very important because you want to run
the grid in a high impedance low current circuit and if the grid intercepts electrons, the current
will dampen the modulating signal. Remember that the grid in order to modulate the current
successfully must have a voltage more positive than the cathode, so current can ow from the
grid to the cathode.
Another aspect of the grid is that it should have a very low capacitance to the anode and for
this as well you want the grid to have a very small area while being able to control the eld.
The other side of it is that if it is too ne, then over time it will be damaged or warped causing
distortion. Photomultiplier tubes [1] are also based on vacuum tubes.
2.4 Klystron tube
The klystron is a device where the amplication is not achieved as in the triode or any of the
devices of today. The way the triode, the bipolar junction transistor or the Field Effect Transistor
achieve amplication is that there is a high impedance control circuit that gates a much larger
8
Electron
source
Tap out the
microwave
gain
Bunching
occurs here
Modulation
occurs here
Energy is
supplied here
Figure 2.3: A klystron tube.
current. The klystron is completely different in that the amplication is not achieved at the time
of modulation. The klystrode conguration is shown in the gure 2.3.
There are ve distinct sections of this setup. The electrons are boiled off a cathode on the left.
Next they are accelerated in a strong electric eld. As they pass through the next section, they
are modulated in a microwave frequency eld. The next section is long passive section where the
amplication actually happens.
When the electrons were modulated some electrons were accelerated and some were deceler-
ated causing a modulation of velocity. In the gure 2.4 the electrons in the section A are decel-
erated, the electrons in section C are accelerated while the electrons in section B are left unmodu-
lated. In the long passive section this velocity modulation is converted into position modulation
because the velocity modulation causes the electrons to bunch as shown on the right in the gure
2.4.
C B A C B A
Figure 2.4: Amplication due to bunching.
The next and last section is the section that extracts the microwave energy from the electron
stream. As the electrons pass through the parallel plates, they cause displacement current to ow
and since the bunched electrons are tightly packed together, they will cause a sudden bump in the
displacement current and this is the amplied output.
So in order to have a lot of amplication, you need a large number of uniformly spaced elec-
trons moving at a large velocity and a large distance over which to allow them to bunch. In
addition you need to extract the microwave energy at exactly the right distance because after the
bunching reaches a maximum, they will overshoot and start to disperse at which point any energy
extracted will be distorted.
9
2.5 Read diode
The Read diode [2] is based on a n
+
p i p
+
structure as shown in the gure 2.5. It uses
the impact ionization effect and is used to generate microwave oscillation output upto 50 GHz or
so.
n
i
+
p
p
+
+
T
he pairs generated here

+
Figure 2.5: The Read diode.
There is no input, the diode just needs to be biased at the correct voltage and if the output
frequency needs to be adjusted the time T in the gure 2.5 has to be varied. If the conditions are
correct oscillations will build up and the author [2] expects an efciency of 30 %. More experiments
[3] may have given better efciency.
p i n
+
p
+
Figure 2.6: The electric eld.
The gure 2.6 gives the electric eld and shows the peak eld occurring at the n
+
p junction.
As the oscillations build up, the oscillation voltage causes the depletion region at this junction to
expand and contract. The charge thus moving in a high eld causes impact ionization to occur
generating hole-electron pairs. The electrons simply move into the n+ region and then into the
supply, whereas the holes move across the space charge region to the p
+
region.
The biasing of the diode needs to be such that during the negative part of the oscillation signal,
the sum of the DC bias and the oscillation is lower than the voltage required to cause impact
ionization, but during the positive part of the oscillation the sum of the voltages causes impact
ionization to occur at the n
+
p junction. So the current increases during the positive portion of
the oscillation voltage and decreases during the negative portion of the oscillation voltage.
The time that they take to reach the p
+
region is what determines the frequency of oscillation.
If the time taken to traverse the space charge region is a half a cycle, then the current is 180
o
out of
phase with the voltage and the oscillation is self-sustaining. The output frequency is given by the
10
equation 2.1 where W is the width of the space charge region and v is the velocity of the carriers
in the space charge region. Anice computer simulation method to analyze the Read diode is given
in [4].
=
v
W
(2.1)
2.6 Gunn diode
The Gunn diode [5] was based on an effect explained by [6] and [7]. The Gunn diode has a
structure as shown in the gure 2.7.
n GaAs

+
Figure 2.7: The Gunn diode.
The oscillation of the Gunn diode is similar to the Read diode, but the mechanism is different.
It is based on the fact that GaAs has a small direct gap and a large indirect gap as shown in the
gure 2.8. The increase in energy from A to B is 0.34 eV. At B the mobility is lower than at A.
A
B
Figure 2.8: The e-k diagram for GaAs [8], [9].
So in the Gunn diode if the eld is high enough, the acceleration of conduction band electrons
in the eld will give them enough energy to move from A to B. But at the same time, the 0.34 ev
difference is larger than the thermal energy so in the absence of the eld electrons are not usually
at B. So if the Gunn diode is biased just at the eld required to cause a transition, then during one
half of the oscillation cycle there will be a large number of transitions from A to B, but during the
other half there will be none.
The electrons at B will move in the eld and cause a current ow, but they will move more
slowly. So just as in the case of the Read diode, you will have an optimum frequency at which
the electrons will arrive exactly half a cycle out of phase with the voltage and this will cause a
11
self sustaining oscillation. Like the Read diode, the Gunn diode will also operate at 40-50 GHz or
higher.
12
Chapter 3
Semiconductor theory
[10] and [11] are good references for quantum mechanics and [9] is a good book for semicon-
ductor physics. The semiconductor properties of Silicon are due to its crystal structure. Like
Carbon, Silicon has a valence of four. In pure crystalline silicon, each silicon atom makes a co-
valent bond with its four neighbors. The structure of pure silicon is as shown in the gure 3.1.
Figure 3.1: The silicon lattice.
In insulators the electrons are rmly bound to their atom or are part of a bond between atoms
that requires a lot of energy to break. In metals, the atoms are arranged in a periodic structure and
the valence electrons are free to move about, and although the metal as a whole is charge neutral
the valence electrons are not tied to a specic atom. The electrons that are rmly bound to an atom
or as part of a bond are said to be in the valence band. The electrons that are free to move about
are said to be in the conduction band.
So the gure 3.2 shows the valence and conduction bands for insulators, semiconductors and
metals. In the case of insulators the two bands are far apart in energy, in the case of semiconductors
the energy gap between the two bands is in the same ballpark as thermal energy and in the case
of metals the two bands overlap.
13
Conduction
Valence
Conduction
Valence Valence
Conduction
Insulator Semiconductor Metal
Figure 3.2: The valence and conduction bands.
3.1 Wave particle duality
Every particle has a wavelength associated with it given by the de Broglie wavelength which
is:
=
h
mv
=
h
p
(3.1)
The Davisson-Germer experiment conrmed this wave property for electrons by impinging a
collimated beam of electrons onto a crystal and observing the diffracted electrons using a counter.
So smaller objects or slower objects have larger wavelengths. The h is Plancks constant.
3.2 Schroedingers time independent wave equation
The Schroedinger wave function is given by the solution to:
(
h
2
2m

2
+ V ) = E (3.2)
The rst term on the LHS represents the kinetic and the second term represents the potential
energy and E is the total energy. In semiconductors we most often use the wave equation to
calculate the occupancy of energy levels and also to calculate the transition probability from one
state to another. The quantity [[
2
represents the probability of nding the particle at a specic
location and summed over all space it will integrate to 1. The sum of two solutions to the wave
equation is also a valid solution of the wave equation.
If a particle is represented by a wave function then you can obtain its momentum as p =
ih = hk , where p is the momentum and k is the wave number. If you need an actual number
you can integrate and average k over all space.
Figure 3.3: A wave packet.
A particle can be represented by a wave packet which may look something like the gure
3.3. It has a beginning and an ending and is the sum of many waves like a Fourier transform
representation, and it has two velocities associated with it. The group velocity is the velocity of
14
propagation of the packet itself which is the velocity of the particle. The phase velocity is the
velocity at which a point on the sum of the waves would have to move in order that the phase at
that point remains a constant.
3.3 Quantum well
In the gure 3.4 is a rectangular potential well on the left side with the walls at V
m
and the
well at zero. In one dimension the wave equation is equation 3.3.
0
Vm
0
a
E1
E2
E3
E4
x
Figure 3.4: Solutions in one dimension.
d
2

dx
2
+
2m[E V (x)]
h
2
= 0 (3.3)
From the standard solution [12] you get the equation 3.4. Now if you set (0) = (a) = 0 you
get the solutions on the right of the gure 3.4.
= c
1
e
id
+ c
2
e
id
(3.4)
d =

2m[E V (x)]
h
(3.5)
On either side of the well, E V (x) is negative and the (x) decays exponentially to zero.
3.4 Free electron theory
The free electron theory of metals is obtained by solving the wave equation in a three dimen-
sional potential box with innite potential walls and no potential inside the box. So the boundary
condition is that the wave function is zero at the walls and the equation becomes:

h
2
2m

2
= E (3.6)
15
Using separation of variables you can make
(x, y, z) =
x
(x)
y
(y)
z
(z) (3.7)

h
2
2m
(
1

x
x
2
+
1

y
y
2
+
1

z
z
2
) = E (3.8)

h
2
2m

x
x
2
= E
x

x
(3.9)
So you get the same solution as in equation 3.4, with
d =

2mE
x
h
(3.10)
The rst term in equation 3.4 has to be zero. From Eulers equation and applying boundary
conditions i.e. = 0 at the walls, the imaginary terms go to zero, and you get a sine wave
solution. The wave number k =

k
2
x
+k
2
y
+k
2
z
. If the box of the potential walls is very large the
possible solutions will be nely distributed in k and the plot of the energy E vs. k will appear as
shown in the gure 3.5.
k
E
Figure 3.5: The E-k diagram.
Because of Paulis exclusion principle two electrons cannot occupy a given state, so the number
of electrons is limited to the density of states function which is a count of the allowed states. For a
semiconductor if you apply the Kronig-Penney model and count the allowed states you nd that
the density of states is proportional to

E so the density of states is a parabolic function of energy.


Nowadays the doping level is so high that the semiconductor is said to be degenerate meaning that
it does not obey the exclusion principle, so the actual carriers exceed the density of states.
3.5 Bloch theorem
The Bloch theorem [13] for the wave function in a periodic potential such as shown in the
gure 3.6 for a displacement dx gives
(k, x +dx) = (k, x) e
ikdx
(3.11)
3.6 Kronig Penney model
Kronig and Penney [14] used a potential function as shown in the gure 3.6 to solve for the
wave function of an electron in a crystal. The rectangular barriers are located between the lattice
sites.
16
V
c
1 2 3
Figure 3.6: The Kronig-Penney potential.
The solution in regions 1,2 and 3 are the same as equation 3.4. Like the quantum well, region
1 has a solution which is sinusoidal, but in regions 2 and 3 the solution is an exponential decay.
The Bloch theorem allows the solution in region 2 to be related to the solution in region 3 by a
multiplier of e
ikc
.
Now the boundary conditions are applied at the interface of region 1 and 2 and at the interface
of region 1 and 3. Even so, to make the analysis possible, the regions 2 and 3 are shrunk and the
barrier potential raised simultaneously so that the net decay stays the same.
This results in a solution as shown in the gure 3.7. As you increase the energy the wave
number alternates between real and imaginary and where it is imaginary you have the forbidden
gaps. Then you can jump to the next higher energy at the same k.
k
E
Figure 3.7: The Kronig-Penney E-k plot.
3.7 Effective mass
In a semiconductor an electron can appear to have many different effective masses depending
upon what you are measuring. This mass is usually smaller than its rest mass m
0
. There are two
important effective masses namely the density of states effective mass and the mobility effective
mass.
The way that you use the different effective masses is when you calculate different quantities
based on the Schroedingers wave equation. In the denominator is the mass m. So if are using the
wave equation to calculate the mobility, then you would use the effective mass for mobility for
those conditions. Similarly if you are using the wave equation to calculate the density of states,
then you need to use the density of states effective mass for those conditions.
Cyclotron resonance as described in [15] is a way to measure effective mass. Cyclotron reso-
nance is used for lots of things, in fact electron cyclotron resonance based plasma etch equipment
17
E
B
Figure 3.8: Cyclotron movement of charged particles.
is sold by several vendors. The idea of cyclotron resonance is fairly straight forward. In a sim-
ple RF system with an RF voltage applied between two plates, the carriers move back and forth
between the plates.
But in cyclotron resonance the aim is to make the charged particles describe a circle or oval. So
it isnt just an RF eld but also a perpendicular magnetic eld as well. From the Lorentz equation,
as it moves either up or down, it also moves sideways. So it describes an oval as in the gure 3.8.
When the charged particle moves between the plates it causes displacement current and that
can be detected and similarly when it absorbs energy from the magnetic eld, that too can be
detected. Since silicon is anisotropic if you use specimens with different crystal orientations the
ovals and the resonance frequencies will be different. From the resonance frequencies and the
shape of the ovals you can obtain the effective masses along the different crystal orientations.
3.8 Fermi-Dirac distribution
The Fermi-Dirac distribution function of equation 3.12 gives the probability of occupancy of
an electron state at energy E. If E = E
f
, the probability is half. To get the actual number of
electrons at that energy, you multiply it by the density of states function as shown in the gure
3.9.
f(E) =
1
1 +e
[EE
f
]/kT
(3.12)
Figure 3.9: The Fermi-Dirac distribution.
To use this equation in practical calculations, the approximation proposed by [16] is easiest.
Just the rst two terms are probably enough as in equations 3.13 and 3.14.
E
f
E
c
kT
= ln

n
N
c

+
n
2

2N
c
(3.13)
18
E
v
E
f
kT
= ln

p
N
v

+
p
2

2N
v
(3.14)
One important feature of the Fermi-Dirac statistics is that in a junction of any type, either a
homojunction or a heterojunction, of p n or p i or n i the Fermi level at equilibrium is at.
3.9 Poissons equation
In the equation shown below is the charge and is the permittivity. If there is no charge, the
right hand side becomes zero and this is called the Laplace equation.
V = (3.15)
3.10 Drift and diffusion
Drift is the movement of charge in an electric eld and the current density J due to this move-
ment is given by the equations 3.16 for electrons and 3.17 for holes. Keep in mind that the ow of
electrons is opposite to the direction of J. Typically
p
is only half that of
n
but in the high eld
channel region under an FET gate it can be less than that.
J
n
= q n
n
E (3.16)
J
p
= q p
p
E (3.17)
Diffusion is independent of charge and applies to all particles. Due to thermal energy all par-
ticles move about randomly (rst shown by Brownian motion). If you have a collection of particles
in one location they will disperse with time. In probability theory this is called the random walk. In
a container they cannot disperse beyond the walls of the container where they are reected.
In a semiconductor the current density at any given point due to a variation in the density of
either holes or electrons is given by the equations 3.18 for electrons and 3.19 for holes. Note that
J
p
has a negative sign because a positive gradient will give a negative current whereas for J
n
the
negative charge will reverse the sign again.
J
n
= q D
n
n (3.18)
J
p
= q D
p
p (3.19)
The Einstein relationship relates the diffusion coefcient D to the mobility by the equation
3.20.
D

=
kT
q
(3.20)
The mobility reduces as the temperature increases due to lattice vibration as shown in [17].
3.11 Haynes-Schockley experiment
The Haynes-Schockley experiment [18] can be used to measure the mobility indirectly. A
slab of the semiconductor such as Silicon is biased as shown in the gure 3.10. A pulse of laser
light of suitable frequency is applied to the semiconductor. The light causes hole electron pairs to
be formed.
19
Laser
pulse
I(t)

+
Figure 3.10: The Haynes-Schockley experimental setup.
If the slab is n type, the holes will drift to the right in the electric eld. Because the hole
concentration is a pulse it will spread due to diffusion. So the narrow pulse of holes will reduce
in height and increase in width as it moves to the right. However, due to recombination the area
under the pulse will reduce with time.
Now if you measure the current owing out of the slab on the right you will see a pulse
of current. The shape of the pulse is the most important. The Einstein relationship relates the
diffusion coefcient D to the mobility . The recombination changes the area under the pulse and
also the shape of the pulse because the holes diffusing to the left spend a longer time among the
majority carriers. In any case from the shape of the pulse you can obtain D and .
3.12 Continuity equations
The continuity equations for holes and electrons are shown below. The left hand side shows
the increase in number of electrons (or holes) in the control volume over time. The right hand side
is the number of particles left behind in the control volume due to the differential of the current
ow less the recombination rate R and plus the generation rate G.
n
t
=
1
q
J
n
R +G (3.21)
p
t
=
1
q
J
p
R +G (3.22)
The continuity equations are actually common to many elds of science. For example in in-
compressible uid ow, the conservation of mass leads to continuity equations very similar to
the one above except the generation and recombination terms are zero. The Schockley-Read-Hall
[19], [20] is the biggest recombination term in indirect gap semiconductors such as silicon and is
modeled by a lifetime as in equations 3.23 and 3.24.
R
SRH
=
n
excess

0
(3.23)

0
=

n0
(p
0
+p

) +
p0
(n
0
+n

)
p
0
+n
0
(3.24)
But in direct gap semiconductors you can also have optical recombination which is modeled
as R
opt
= C(np n
2
i
) where n
i
is the intrinsic concentration. The value of C can be obtained
20
from experimental studies such as [21]. If the number of excess carriers is very high then Auger
recombination can occur which requires a three particle interaction and is modeled as in equation
3.25.
R
Aug
= B
n
(n
2
p nn
2
i
) +B
p
(np
2
pn
2
i
) (3.25)
3.13 Band diagrams
A band diagram is a spatial plot of the different energies in the semiconductor specically the
valence and conduction bands and the Fermi level. Drawing the band diagram starts with the
Fermi level. At equilibrium with no applied voltage the Fermi level is at. The gure 3.11 shows
the band diagram for a p-n junction.
junction
n type
p type
Ef
depletion width
Figure 3.11: A p-n junction at equilibrium.
With a voltage applied to a homogeneous semiconductor the Fermi level is not at and it
represents the potential and its derivative is the negative of the electric eld. Within a junction
the Fermi level splits into two quasi-Fermi levels one for the p type side and the other for the n
type side and the separation is equal to the applied voltage.
n =

Evacuum
Ec
D(E)f(E)dE (3.26)
p =

Ev

D(E)f(E)dE (3.27)
Having drawn the Fermi level you extract the conduction band and valence band from it.
The known electron concentration is equal to the integral of the product of the density of states
and the Fermi probability from the conduction band to the vacuum level. Similarly the known
hole concentration is the integral of the product of the density of states function and the Fermi
probability from the valence band to minus innity.
Since you have the majority carrier concentration (either n or p) you can obtain the minority
carrier concentration by the relationship np = n
2
i
where n
i
is given by the relationship
n
i
=

N
c
N
v
e
Eg/2kT
(3.28)
Here N
c
and N
v
are constants and are the effective density of states for electrons and holes.
21
3.14 Impurities
Impurities or dopants in Si require some energy to be ionized and occupy energy levels [22],
[23] immediately below the conduction band for donors and immediately above the valence band
for acceptors. The gap is small, 0.044 ev for Boron, 0.049 ev for Arsenic, 0.044 ev for Phosphorus.
At room temperature all of these dopants are ionized. P and As lose an electron to the con-
duction band and become positive ions, whereas B accepts an electron from the valence band and
leaves behind a hole, and becomes a negative ion.
Copper has an acceptor level 0.49 ev from the valence band and Silver has an acceptor level
0.54 ev from the valence band. These are deep level traps and if the Si is contaminated with
these impurities they can cause signicant leakage because even if there are only a few of these
states the probability of transition to or from these states from the valence or conduction band is
exponentially increased because the gap is only half of the energy gap.
Excessive doping results in band-tailing [24] where the impurities cause either the conduction
or valence band to extend into the energy gap. This effectively reduces the band gap and increases
leakage current.
22
Chapter 4
Active devices
a c e b c s
g
d
Figure 4.1: The semiconductor active devices.
4.1 P-N Junction diode
The junction diode is the rst device in the gure 4.1. It is formed by rst implanting one
species either n or p, then implanting the other to form a junction. As you move toward the
junction the depletion region begins gradually. In the bulk n type region the electron concentration
is N
D
which is the donor concentration and in the bulk p type region the hole concentration is
N
A
which is the acceptor concentration.
V
Vbi
n p
Figure 4.2: A diode junction in equilibrium.
The electrons cross the junction from the n type to the p type region and occupy the holes
thereby leaving behind ionized donors on the n type side and causing the acceptors to become
negatively charged ions. This is called depletion and the depletion region is characterized by a lack
of carriers. In order to get the actual extent of depletion you have to solve the continuity equations
and Poissons equation.
The depletion approximation is a simple way of solving the diode as shown in the gure 4.2. You
assume that the depletion region begins abruptly as you approach the junction. Then the charge
on the n side is simply nx
n
if you assume unity cross-sectional area. Similarly the charge on the
23
n type
p type
Vbi
Figure 4.3: Obtaining the built-in voltage.
p side is simply p x
p
. From charge neutrality, these two have to be equal to each other.
E(x) =
1

(x)dx (4.1)
V (x) =

E(x)dx (4.2)
depletion width
p type
n type
Ef
Vbi VA
Figure 4.4: Diode under bias.
Since you know the doping concentrations N
D
and N
A
and the band gap E
g
and the density
of states N
c
and N
v
, you can use equation 3.12 to obtain the Fermi levels on either side of the
junction. Then obtain the electric eld by equation 4.1 and then the potential by equation 4.2.
Then you increase the length x
n
and x
p
until the V
bi
is the difference as in the gure 4.3. The
resulting band diagram is shown in the gure 3.11. As you can see the conduction and valence
bands are the same shape as the potential V (x) except they are ipped and scaled by q.
ln I
kT
q
Va
Figure 4.5: Diode current.
As you apply a positive bias to a band diagram you will push it down relative to the xed end
as shown in the gure 4.4. One way to remember this is to remember that electrons are attracted
24
to a positive potential and so the conduction band has to bend downward w.r.t the xed reference
so that the electrons can slide down the conduction band toward the positive potential.
I = I
s

e
q Va
kT
1

(4.3)
The ideal diode equation is 4.3. For V
a
> 0, the diode current is exponential, so if you plot the
natural log of it vs. voltage, as in the gure 4.5 the slope should be nkT/q where n usually lies
between 1 and 2.
4.2 Bipolar junction transistor
If you widen the diode and add a third implant to create either an n-p-n structure or a p-n-p
structure you get the bjt as shown in the gure 4.1. The most important feature in a bjt is how
thin the base region is. If the base is too wide then the bjt will not function at all.
N
P
N
1
2
3
b
e
c
Figure 4.6: Carrier ow in a BJT.
In the gure 4.6 there are three currents shown numbered 1, 2 and 3. The two currents 1 and
2 add up to form the base to emitter current. 1 is the hole current owing from the p type base to
the n type emitter. 2 is the electron current owing from the n type emitter to the p type base and
actually being captured by the contact to the p type base.
However as you can see in the gure 4.7 the collector is heavily reverse biased with respect to
the base and so any electrons in the base see a long slide down the conduction band to the collector
contact. Due to the narrowness of the base, as the electrons move across the base toward the base
contact, a large number of them fall down the potential into the collector causing the current 3. If
the current 3 is much larger than the sum of the currents 1 and 2, then you have a large gain.
The gure 4.8 illustrates the problem of punchthrough. As you know the depletion width
depends on the doping concentration on either side of the junction. If the base and emitter are
at the same potential, then the region marked as depletion 1 in the gure 4.8 is the equilibrium
depletion width. Since the base collector junction is reverse biased to the supply potential the
region marked depletion 2 will be much larger than depletion 1.
25
Collector
Base
Emitter
p type
n type
n type
Figure 4.7: A BJT under bias.
When the sum of these two widths is equal to the physical width of the base, the base contact
essentially loses control of the potential in the base and all that electrons in the emitter see is a long
depletion region with a conduction band falling continuously toward the collector. So appreciable
current will ow.
c
e
b P
N
N
depletion 1
depletion 2
Figure 4.8: Punchthrough in a BJT.
So you want to design your device so that even including process variation and a supply
voltage surge, all BJTs in the circuit avoid punchthrough. But you cant make the base too thick
because you the current gain is given by equation 4.4, and if you make the base thicker the electric
eld from the collector is reduced and at the same time the space charge voltage drop looking into
the base also drops because I
2
and I
1
are moving in a wider channel and the net result is that the
gain reduces because I
2
increases and I
3
decreases.
G
I
=
I
3
I
2
+I
1
(4.4)
The collector current is given by the equations 4.5 and 4.6. So I
s
is similar to the reverse
saturation current of the base diode except amplied by the maximum value of the gain .
I
c
= I
s
exp
qV
be
kT
(4.5)
I
c
= I
b
(4.6)
26
V
A V
ce
I
Figure 4.9: The Early voltage.
The Early voltage is shown in the gure 4.9. The equation 4.5 assumes that the collector
current depends only on the V
be
and not the V
ce
. But in fact as the V
ce
rises the I
c
will rise as well.
This is due to a reduction in the width of the base due to an increase in the V
cb
and hence the
depletion width called the Early effect. The Early voltage is a way to factor this in as in equation
4.7. The inherent assumption is that V
A
V
ce
.
I
c
= I
s

exp
qV
be
kT

1 +
V
ce
V
A

(4.7)
4.3 Heterojunction Bipolar Transistor
Collector
Base
Emitter
p type
n type
n type
Figure 4.10: The band diagram for a HBT.
As we saw in the last section the current I
1
reduces the current gain and you want to make it
as small as possible. This is achieved in the HBT by using an emitter of a larger band gap than
the base as shown in the gure 4.10. Here the dotted line is the homojunction. In this way I
1
is
reduced because the holes in the base see a larger potential barrier to entering the emitter.
4.4 Field-Effect transistor
The FET is controlled by the gate. The gate has a capacitance to the body of the FET and when
this capacitance is charged a sheet of charge forms below the oxide and this is the charge sheet [25]
model. The body of the NFETs is tied to ground and the body of the PFETs is tied to supply. The
source and drain are of the opposite type to the body, and so in the absence of a gate voltage the
source to body region is either at equilibrium or is reverse biased while the drain to body diode is
denitely reverse biased, so current does not ow from the drain to the source.
27
s
g
d
Figure 4.11: Saturation of an FET.
The gate voltages have to lie between supply and ground. For an NFET if the gate is at supply
the charge sheet that forms under the gate is negative i.e. it is made of electrons. The source is n
type and is at ground potential. If the drain is somewhat lower than the gate, the charge sheet is
connected to both the source and drain and the electrons from the source see an electric eld from
the drain to source in which they travel. If the drain is also at supply just like the gate is then the
charge sheet cannot reach up to the drain because you cannot have a charge sheet unless there is
a capacitor voltage to support it. This situation is called saturation and is shown in the gure 4.11
[26].
If you hold the gate at supply and slowly step the drain up fromground to supply, the steadily
increasing drain to source electric eld causes a steadily increasing drain to source current ow. As
the drain approaches supply and the charge sheet at the drain starts to disappear, an increasingly
larger portion of the drain to source voltage is dropped across the gap between the drain and the
charge sheet due to its higher resistance.
I
Vds
Linear
Saturation
Figure 4.12: Linear and saturation operation of the FET.
Even in the saturation region the current does increase with increasing drain voltage however
the rate of current increase with drain voltage starts to fall and for low gate voltages the gap
between the charge sheet and the drain becomes large enough that the drain current barely rises
with drain voltage.
The gure 4.12 shows the two regions of operation of the FET other than the sub-threshold
namely the linear and the saturation. The gure 5.6 shows the sub-threshold behavior. The
equation 4.8 represents the linear region, the equation 4.9 represents the saturation region and
the equation 4.10 represents the sub-threshold region.
I
d
=

ox
T
ox
W
L

(V
gs
V
th
)V
ds

1
2
V
2
ds

(4.8)
28
I
d
=

ox
T
ox
W
L
(V
gs
V
th
)
2
(4.9)
I
d
= k
x
W
L
e
(qVgs/nkT)
(1 e
(qV
ds
/nkT)
) (4.10)
The effect of the high electric on mobility is given in [27], [28]. Several theories on computing
the threshold voltage are given in [29], [30]. Other models for conduction are given in [31], [32].
The effect of moving frommetal gates to polysilicon gates is given in [33]. The narrowwidth effect
is described in [34], [35].
4.5 FET small signal equivalent
s
g
d
Figure 4.13: Parasitics of a FET.
There are two kinds of circuits that we use namely digital and analog. When you simulate a
digital circuit you are mostly interested in its temporal behavior. On the other hand when you
design an analog circuit you are also interested in its harmonic behavior. So essentially when you
design a digital circuit you only need to make sure that as you vary the signals in the time domain,
the outputs of the circuits change rapidly enough and that they can drive enough current to charge
and discharge the load capacitances.
When an analog circuit is designed there are really two parts to the process which you need to
iterate through. The rst part is to set the bias point. The bias point for a FET is the combination of
the three voltages V
gs
, V
bs
and V
ds
. For a bipolar transistor it is the combination of V
be
and V
ce
.
g
s
d
b
Figure 4.14: Small signal equivalent of a FET.
The gure 4.13 shows the parasitics of a FET. Using these parasitics we get the small signal
equivalent of a FET as shown in the gure 4.14. The only other component is a current source
29
which is given by I = g
m
V
gs
. g
m
is called the transconductance and is given by g
m
= dI
ds
/dV
gs
.
The value of g
m
is dependent on the bias point. The resistance is the channel resistance and it too
is dependent on the bias point.
The small signal equivalent is useful in order to obtain the behavior of a circuit when AC
signals are applied to the gates. But keep in mind that this AC behavior is only valid so long
as the signal is very small compared to the supply voltage. If the prediction is for large swings
in the drain voltages, then the prediction is incorrect because under such conditions you are too
far from the bias point at which the equivalent circuit was extracted and so the capacitances, the
transconductance and the channel resistance will actually be different. So a small signal equivalent
is used as a design tool but a time domain simulation using a sufciently small time step will give
you the actual response of the circuit.
4.6 Other transistors
The most visible place that MESFETs are used are in circuits used in routers. They are made
using GaAs MESFETs because GaAs has a much higher mobility than Si and thus you can get a
much higher speed out of the same size chip than using Si MOSFETS. The competition for GaAs
MESFETs is provided by Si BJTs which are fast by virtue of being junction devices.
The MESFET does not use a gate oxide. Instead the metal forming the gate is deposited directly
onto the GaAs between the source and the drain. When metal is in contact with the GaAs it forms
a Schottky contact and there is a barrier which needs to be overcome in order that current ow.
This barrier is caused by the difference in work function between the metal and the GaAs.
So when a MESFET is turned on you actually have a diode to ground and so gate current ows
continuously. This is different from the MOSFET where the gate current stops owing once the
gate is charged. Just like a MOSFET the current ow starts when the body to source is forward
biased. The way this happens is similar to the MOSFET in that the gate voltage is initially shared
between the Schottky diode and the source to body diode and the region under the gate does get
an increased number of electrons which then see the drain to source electric eld.
So even here you get a gain because for every electron which travels from the source into the
gate, you have a large number of electrons traveling from the source to the drain, but the gate still
controls this current ow by controlling the potential of the channel immediately adjacent to the
source.
The High Electron Mobility Transistor received a lot of attention because it uses a channel
which is an two dimensional electron gas giving the carriers an extremely high mobility. One in-
teresting thing about the development of the HEMT was that modeling engineers would predict
a theoretical maximum mobility and then the experimental researchers would promptly obtain a
higher mobility and this happened several times.
Figure 4.15: The well of a HEMT.
30
The channel of the HEMT is caused by growing a low band gap layer of semiconductor over a
layer of high band gap semiconductor of the same lattice periodicity. The easiest is Al
x
Ga
1x
As.
As you increase the fraction x of Aluminumthe band gap increases but the lattice spacing remains
the same. The best reference to start with for information about Al
x
Ga
1x
As properties in [36].
For a simple example of usage you could look at [37].
For the case of an abrupt i n
+
heterojunction as shown in the gure 4.15, a large number of
electrons on the high band gap side on the right will fall down into the low band gap side. This is
creates a charge dipole which will raise the bulk potential on the left side of the gure up until the
Fermi levels align. Immediately to the left of the junction there is a deep valley in the conduction
band toward the Fermi level to account for the excess electrons.
This potential valley now requires you to solve for allowed states using the wave equation
giving you the density of states function. Poissons equation needs to be satised as well. Due to
the well the energy along the vertical axis is constrained so there is one less degree of freedomand
hence the mobility is higher along the lateral axis.
The strained layers is another version of growing different types of semiconductor on top of
each other, but here the lattice spacing of the two layers is different, so the layer with a smaller
lattice spacing is pulled apart whereas the other is pushed together.
If the layers are too thick, the shear force will weaken the junction but by alternating two or
three mono layers of these two materials you can get a new semiconductor with intermediate
properties such as band gap [38]. So a strained layer in the channel of a FET using Ge will create a
channel region with a lower bandgap allowing carriers to fall into it and a larger current to ow.
4.7 Shrink problems at 0.06 and below
4.7.1 The premise of the shrink
From the 1 to the 0.18 generations, shrinks were very straightforward. They were based
on the saturation equation for MOSFET current given by:
I =

ox
T
ox
W
L
(V
gs
V
th
)
2
(4.11)
Device parameter Value after shrink
Gate oxide thickness 0.707
Length 0.707
Supply voltage 0.707
Width 0.707
Gate oxide thickness 0.707
Threshold voltage 0.707
Current same
Gate loading 0.707
Operating frequency (0.707)
2
= 2.0
Table 4.1: Effect of shrink.
If you shrink the MOSFET by 0.707 as shown in the Table 4.1 then the ratio of the new satu-
ration current compared to the old saturation current and the ratio of the new gate capacitance to
31
the old gate capacitance are given by:
I
new
I
old
=
W
new
L
old
T
ox_old
W
old
L
new
T
ox_new
(V
gs_new
V
th_new
)
2
(V
gs_old
V
th_old
)
2
= 1 (4.12)
New load
Old load
=
W
new
L
new
T
ox_old
W
old
L
old
T
ox_new
= 0.707 (4.13)
So if you have the same current charging a load of 0.707 what it was previously to a voltage of
0.707 what it was previously, then you can operate it at approximately double the clock rate that
it was operating at previously.
Todays reality is a little different from the past. Let us compare an imaginary 0.13 L
eff
process to an imaginary 0.06 L
eff
process. This is shown in table 4.2. Because I am making
up these numbers to show a point, they are not very real but at least they are good enough to
understand the basic ideas.
Device parameter 0.13 L
eff
0.06 L
eff
V
dd
(Supply voltage) 1.8 V 1.1 V
V
th
(Threshold voltage) 0.5 V 0.4 V
S
t
( Sub-threshold swing) 125 mV/dec 100 mV/dec
T
ox
(Gate oxide thickness) 25 14
L
diff
( Sub-diffusion) 0.022 0.015
Gate loading z 0.6 z
X
j
(Junction depth) 0.15 0.10
N
D
(Drain doping level) 10
20
/cm
3
3 x 10
20
/cm
3
T
j
(Junction temperature) 100
o
C 70
o
C
Thermal budget x 0.6 x
Operating frequency 3 GHz 12 GHz
Heat generation y/mm
2
higher
Table 4.2: Comparison of device parameters.
Let us look at each parameter of the Table 4.2 and what is limiting it if any and what effect
that has on the MOSFETs operation and its effect on the circuit behavior as well. Let us use the
NFET to illustrate our comparison.
4.7.2 V
th
(Threshold voltage)
As a general rule supply voltage should scale directly with gate length. So therefore the ex-
pected supply voltage for a 0.06 L
eff
should be 0.06/0.13 of the 0.13 L
eff
supply voltage but
that is not really feasible because of the V
th
.
In order for the FET to be useful, we need to be able to turn it on and off just like a switch.
From chapter 2 we know that when the V
gs
is lower than the V
th
of an NFET, the FET is meant to
be off. But is it really off ? Also we know that when the V
gs
is larger than the V
th
the FET is meant
to be on. But is it really on ? Just like everything in life, there is no black and white, there is a
whole gray area when it is neither one nor the other.
In the Figure 4.16 you see the gate characteristic at V
ds
= V
supply
and it is divided into 2
portions essentially the portion on the left which is the sub-threshold region and the portion on
the right which is the saturation region. Keep in mind that the y axis here is in the logarithm scale
32
Vth
headroom
Ith
Ioff
0
Log I
4 decades
Ion
Figure 4.16: The gate characteristic showing V
th
.
i.e. 1 actually means 10, 2 means 100 etc. When the V
gs
is zero, the current is as low as it will go
and this current is called I
off
. The current at V
gs
= V
th
is I
th
. You want the I
off
to be about 4
decades lower than I
th
and you want I
on
to be about 2 decades higher than I
th
. For example if the
I
th
is 1 A, then you want I
off
to be 100 pA and you want I
on
to be about 100 A.
Now since we just reduced the supply voltage to 1.1 V, then we have to also reduce V
th
. But if
we reduce V
th
, the curve in the Figure 4.16 moves to the left and the I
off
will increase signicantly.
But if that happens then the chips will consume a lot of power all the time and they will get very
hot and basically melt. So we will only reduce the V
th
to 0.4 V. But even so we have to increase the
sub-threshold slope so that the I
off
stays the same. Also when we reduced the supply voltage,
our headroom also reduced from 1.3 V to 0.7 V. This is probably a problem, but the real extent of
the problem wont be clear until we set all the other parameters and test the new transistor. The
problem is that:
I
on
=

n

ox
T
ox
W
L
eff
(V
gs
V
th
)
2
(4.14)
In this equation W and L
eff
scale together, so we get no advantage there. T
ox
reduces so we
get some advantage there, but (V
gs
V
th
)
2
just reduced from 1.3
2
to 0.7
2
. So the net result is that
I
on
is lower but that is still all right because the load capacitance also reduces as we will see later.
So lets hold off on judging the transistor until later.
4.7.3 S
t
(Sub-threshold swing)
The dependency of the FETs current on V
gs
when V
gs
is less than V
th
is basically similar to that
of a diode i.e. it is exponential in nature. But unlike a diode, in this case the V
gs
is not really the
voltage across the body-source diode because V
gs
is applied across the gate-source contacts and
it is related to the body-source diode voltage through the gate capacitor. Now we know that a
reverse biased diode acts as a capacitor, so the sharing of V
gs
between the gate-body capacitance
and the body-source diode capacitance is proportional to the ratio of the depletion width of the
body-source diode and the oxide thickness T
ox
. In other words if the ratio of the T
ox
to the body-
source diode depletion width is smaller, you will get a steeper rise in the sub-threshold current vs.
V
gs
.
33
Vth
dV
decades
Log I
Figure 4.17: The gate characteristic showing swing.
The sub-threshold swing is the inverse of the slope of the sub-threshold drain current vs. V
gs
and is expressed in mV per decade of current rise. At 0.13 L
eff
the S
t
was 125 mV/dec and
now we need a steeper slope so we use 100 mV/dec, so that works out to 0.4 V over 4 decades of
current. So then, the off current density will stay the same.
4.7.4 T
ox
(Gate oxide thickness)
The gate oxide as well is normally scaled linearly with gate length. The issue here is that the
gate oxide must not break down when you apply the maximum supply voltage on it. At least to
the rst order each mono-layer can withstand about 200 mV. The catch is that as the thickness gets
lower the probability of electrons tunneling across the gate oxide from the gate to the channel or
equally probably from the channel to the gate starts to get larger. Many physicists and experimen-
talists have come to the conclusion that 5 mono-layers is the physical limit. The other aspect to this
is that as the gate lengths reduce there are larger concentrations of electrons in the channel and
they are moving very fast so due to the Boltzmann distribution of energies there will always be a
highly energetic tail in the distribution where the particles have a lot of energy and want to try to
tunnel through the gate oxide. 5 mono-layers is 14 and that is what we chose for the 0.06 L
eff
which makes some of us believe that when we go to 0.03 L
eff
the gate is going to be very leaky
indeed. There are other materials such as Silicon-Nitride that have a slightly higher than Silicon
di-oxide but they have different problems of their own.
Another interesting problem with making a 14 gate oxide is the way that gate oxide is de-
posited. Gate oxide needs to be very pure so it is normally grown not deposited by heating the wafer
at 800
o
C for about a half-hour in pure oxygen and nitrogen without water vapor. But when you
put the batch of wafers into the oven, it takes a few minutes for the temperatures to stabilize and
so the timing gets to be a problem as the thickness of the gate oxide is reduced because the way
that oxide grows is that it grows the fastest in the rst few minutes after placing it in the oven
because at that time all you have is pure silicon reacting with the hot oxygen gas and forming
silicon dioxide but as time goes on the reaction slows down because the silicon under the oxide is
separated from the oxygen it wants to react with by the growing oxide layer. So because of this
even if you put the batch of wafers into the oven and take them out in two minutes you will have
signicant oxide on them and the variation is very large. This was not a problem when you were
34
growing 60 of gate oxide because such a thick oxide growth buffers itself but it is a very big prob-
lem when you only want to grow 14 worth of gate oxide because due to the normal variation in
the temperature of the oven and depending on how long you keep the oven door open and other
such things that you normally dont consider, the oven temperature during the rst few minutes
after you close the oven door may vary between 700
o
C and 800
o
C and of course that means you
may get 6 mono-layers instead of 5 which is already 20% too much.
So basically we need some researcher to come up with a way to inhibit the oxide growth until
the oven temperature stabilizes and also slow down the oxide growth process so that we can keep
the batch of wafers in the oven for a half-hour give or take a minute. I am sure that some bright
young student at Berkeley or Stanford will do just that. Although keep in mind that a solution to
this problemhas been in existence for decades and it is called molecular beam epitaxy or MBE where
you can grow precisely as many mono-layers as you want very repeatably indeed. Because of its
high cost MBE is normally used for building semiconductor lasers and other higher cost products.
When you reduce the gate oxide thickness the capacitance per unit area increases. But since
the W and the L
eff
both reduce as well, the area decreases as well. Let the minimum W for the
0.06 L
eff
process be 0.5 and the minimum W for the 0.13 L
eff
process be 1.0 , then:
New gate capacitance
Old gate capacitance
=
0.06 0.5 25
0.13 1.0 14
= 0.41 (4.15)
4.7.5 L
diff
(Sub-diffusion)
The source and drain are implanted on either side of the gate using a method called the self-
aligned process. However subsequent heat cycles for example for the purpose of annealing the
source and drain regions or otherwise for growing an oxide etc. will cause these implants to
diffuse away in all directions and one of those directions is under the gate. The length of the gate
as it is drawn in the self-aligned process is called the drawn gate length L
drawn
and this is usually
longer than the effective gate length L
eff
and the sub-diffusion is the difference between the two
i.e.:
L
diff
= L
drawn
L
eff
(4.16)
So when you shrink the MOSFET you have to reduce the sub-diffusion as well. One way to
do this is to reduce the heat cycles following the implant process. Another way is to reduce the
implant energy as a way to reduce the damage and thereby to do away with annealing. Another
method is to implant acceptors at a large angle into the source and drain into the region under
the gate as shown in the Figure 4.18 as a way to counteract the sub-diffusion and increase the
effective gate length back toward the drawn gate length.
g
s
d
Figure 4.18: The sub-diffusion and the implants to reduce it.
35
Some semiconductor companies adjust the implants so that the L
eff
L
drawn
and that may
sound like a good thing but consider this; If an effective gate length is shorter than it is drawn
because L
diff
> 0 then the current becomes larger and the chip runs correctly but may consume
too much power but if an effective gate length is longer than it is drawn because L
diff
< 0
then the extra channel region is not directly under the gate and so the gate control of this region is
not direct and this is a huge problem because the MOSFET may not even turn-on properly which
means the current will be too low which means it is non-functional. For example if a microproces-
sor is sold as a 2.5 GHz processor, people are not going to be happy if it can only run at 2.0 GHz
or even less and even worse they may run it at 2.5 GHz not knowing that some circuits are not
operating correctly and then the computer will behave unpredictably.
4.7.6 Gate loading
For all practical purposes we consider the load seen by the logic gates to be a capacitance. That
capacitance has three main components:
Gate capacitance
When you lower the gate oxide thickness you increase the capacitance per unit area as we
discussed in the subsection on T
ox
and if you reduce the T
ox
, L
drawn
and W by the same factor
of say half you hope that the load due to the gate capacitance is reduced to half as well. But
in reality the W does not reduce to half. Consider this; that when we design a next generation
microprocessor we are not satised if it can do the same as the previous generation did; No, we
want it to run faster and faster. So as a result the W actually used in the chips is usually larger
than half. Maybe a more realistic estimate of W is two-thirds. So effectively the gate loading due
to the gate capacitance of the load reduces not to a half but perhaps closer to two-thirds.
Junction capacitance
The source and drain are junctions as in a diode. These junctions are reverse biased w.r.t the
body. A reverse biased diode has a fairly large capacitance as shown in the Figure 4.19.
Figure 4.19: The source junction capacitor.
The junction capacitance is really made of two components. The vertical component is related
to the area of the source or drain. It is a capacitance between the bottom of the source implant
and the body below it. The other component is the lateral component and it is related to the
periphery of the source or drain implant and the depth X
j
of the implant. The other factor is the
doping concentration of the source and drain because the higher the concentration the lower the
depletion width of the reverse biased diode. In the Figure 4.20 is shown the vertical view of the
source and drain junction.
36
L
W
Figure 4.20: The area and periphery of the junction capacitor.
So ultimately the capacitance that is seen is given by:
C = [

silicon
W L
T
depletion
] + [
2
silicon
(W + L) X
j
T
depletion
] (4.17)
The rst component is not good because we have already decided that the W is only going to
reduce to two-thirds and so the best that we can expect of this component is that it reduces in a
manner similar to the gate capacitance. The second component is not good at all because it is not
dependent on (W L) but rather (W + L) and it also depends on the depth X
j
which denitely
does not reduce to half. So to sum it up we want the junction capacitance to reduce to half but we
have to settle for perhaps two-thirds or even more than two-thirds.
Line capacitance
Vert
Hor
Figure 4.21: The interconnect line capacitances.
The lines interconnecting the FETs are made of aluminum and more recently of copper. The
lines appear as shown in the Figure 4.21. They are stacked in layers of as many as 5 layers and
are separated horizontally and vertically by a dielectric with the consistency of glass. When you
shrink the chip these lines get closer to each other. So the capacitance of these lines w.r.t each other
increases. Of course to compensate the lines do get shorter because the FETs get closer to each
other. But in the end the line capacitance does not reduce enough as the chip shrinks.
4.7.7 X
j
(Junction depth)
Junction depth X
j
is the depth at which the source and drain diffusions change type back into
the substrate doping type, so in this case of the NFET it is the junction between the n-type source
37
or drain and the p-type body. Like everything else, you want the junction depth to reduce as the
device shrinks. This is a problem because in order to decrease the implant depth, you have to
reduce the implant energy. As you continue to reduce the energy a point is reached when the
energy of the dopants that are trying to penetrate the silicon surface is so low that the statistical
variation of the implant depth becomes large.
Part of the problem is that implants are usually done at a slight angle of perhaps 17
o
from
the vertical. The reason for this is that if the implants are done vertically a phenomenon occurs
called channeling where the dopants apparently see a tunnel and they can travel quite a long dis-
tance within that tunnel and so the depth of the implant is very large. To combat this, the implant
has to be done so that you cannot see a line of sight through the silicon crystal, so that the implant
dopants collide with a lattice atomand thereby stop penetrating further into the crystal. The prob-
lem is that at low enough energies the implant dopants may simply get reected off the surface of
the crystal and it is hard to predict exactly how much of the implant will be reected this way. So
it is not that you will not get a shallow implant it is just that sometimes you will and sometimes
you wont and this type of variation will quite simply make the chip non-functional and that is
the problem.
4.7.8 N
D
(Drain doping level)
Sometime by the mid-nineties doping densities for the source and drain implants reached
degenerate levels. All that means is that the number of dopants exceeds the number of unique
states allowed by quantum mechanics. In itself this is not an issue but what it does do is that this
super high density of dopants changes the behavior of the silicon crystal. One important effect of
extremely high doping density is the reduction in the mobility of holes and electrons [39].
This directly means you are reducing the current ow and that is not what we want. And yet
we dont have a choice in this, as the devices shrink the dopant densities have to go up to provide
enough carriers. Another bad effect of extremely high doping densities is leakage currents. This
is something that can only be explained with a certain amount of quantum mechanics so Im not
going to say any more.
4.7.9 T
j
(Junction temperature)
Log I
Lower
temperature
Figure 4.22: The change in the gate characteristic with temperature.
38
The effect of junction temperature is more readily visible in the sub-threshold region than in
the saturation region. In the Figure 4.22 the curve that drawn as a dash-dot line is what happens
when the temperature is lowered. There are already a few computer vendors that are offering
water cooling to reduce the operating temperature of the computers CPUby as much as 4050
o
C.
These coolers are very effective and I think you will see a lot more of that even going to the extent
of using refrigeration and good quality copper plates to suck the heat away from the CPU and
graphics chips.
Cooling a chip made of MOSFETs is a win-win situation. The sub-threshold current is basically
a diode current, so its response to an applied voltage is based on units of [k T/q] and this quantity
is normally referred to as V
T
and at room temperature it is about 26 mV. So you can imagine that
for every 26 mV change in applied voltage the current will change by a factor of e = 2.68. But if
you raise the temperature to 100
o
C the V
T
is 32 mV so the current changes by e every time the
applied voltage changes by 32 mV. So by cooling the chip down so that it runs at 10
o
C, you can
reduce the V
th
by as much as 60 mV and you can use this extra 60 mV to increase the saturation
current.
At the same time you win in the saturation region as well. When you reduce the temperature
the atoms in the lattice have less energy and vibrate more gently. The thermal energy of the
electrons also reduces. As a result of these things the mobility of the electrons in the channel
increases. The current increases directly with mobility, so the I
on
increases as shown in the Figure
4.22. When the I
on
increases, the charging and discharging of the load capacitance is faster, so the
chips runs faster.
4.7.10 Thermal budget
Thermal budget is a process issue. It refers to the heat cycles that a wafer goes through as it is
processed froma blank wafer with nothing on it to a completed wafer that is ready to be broken up
and packaged into working chips. The thermal budget is important because it affects the diffusion
of the dopants [40]. To understand the worst case scenario it should be noted that if you leave a
functional wafer in the oven at over 800
o
C for several days, all the implanted dopants will diffuse
around to the point where the source and drains will merge and the wafer will be non-functional.
The two main reasons for the thermal cycles are oxide growth and the anneals required after
implantation. So it would be nice if the critical implants that we dont want to diffuse can be
delayed and done after the oxide growths and after less critical implants. But to some extent
the order of the different implants, anneals and oxide growths is immutable. For example the
anneal has to follow the implant although other implants may take place in between. Sometimes
the oxides are used as masks during the implant so they have to be done prior to that particular
implant. An oxide growth can double as an anneal however the problem with that is that the
temperature required to anneal is different than the temperature required to grow an oxide. To
create an oxide you need to heat to only 800
o
C or so because all you are doing is providing
enough energy to the endothermic reaction to cause the oxide to grow rapidly however to anneal
the semiconductor you need to heat to perhaps 1100
o
C to give the silicon and dopant atoms
enough energy to move about and create new bonds with their neighbors in an orderly manner.
So when you do a thermal budget you have to somehow scale the different temperatures to
a common temperature. The temperature I normally like to use for a thermal budget would be
850
o
C. The higher temperature cycles at 1100
o
C and so on are scaled to the 850
o
C temperature
by multiplying by a pro-rating factor. Keep in mind that this factor will differ for different diffusing
particles but you may pick a factor that represents the diffusion of the most critical particles.
Another feature of the thermal budget is that when you assess the impact of the heat cycles
39
it is usually as an aggregate of the effects of different portions of the heat cycle on the implants
that they succeed. So the total thermal budget is not meaningful except in that it gives you a
mind-picture of what the process looks like. As a general rule however, a lower thermal budget is
usually a good thing because it means that the particles you implant stay where you put them.
4.7.11 Heat generation
The issue of the heat generated by a chip is also worrisome when considering the chips of the
future. This is mostly important for large chips such as microprocessors. In the discussion below
keep in mind that in whatever way power is consumed it ultimately exits the chip in the form of
heat i.e. the heat produced by the chip is equal to the power consumed by the chip. In a CMOS
chip the power is effectively given by:
PC = f
C V
2
Supply
2
(4.18)
where C is the total capacitance switched, V
Supply
is the supply voltage and f is the frequency
of operation of the chip. Keep in mind that C is not the total capacitance of the chip because
the vast majority of the circuitry in the chip maintains whatever state it is in across many cycles
and only changes states occasionally. So C is the capacitance that is actually being switched in
each cycle. This is why many CMOS manufacturers especially microprocessor manufacturers are
designing their chips so that as few gates are changing state at any given time.
There is another source of power consumption which is the leakage current. I
off
is never
zero so for each and every logic gate in the chip a certain amount of current is leaking through
whichever FET is supposed to be off from the supply to the ground. The power lost in this way is
given by
PL = n I
2
off
R (4.19)
where n is the number of circuits that are leaking, and R is the resistance of the path from the
supply to the logic ground.
Both types of power consumption rise in every generation of microprocessors, the rst because
the FET and line capacitances do not reduce sufciently as the gate length is reduced and the
second because in each generation we are willing to tolerate a slightly higher off current and
nally because we widen the FETs until we get the speed we want out of the chip. Anyway the
net result is that our microprocessor chips are generating as much heat as a room heater and we
need very good heat sinks and powerful fans to remove this heat.
40
Chapter 5
Process characterization
5.1 Overview
Semiconductor chip manufacturing has several sub-groups within it who have their own dis-
tinct "philosophy". The most important is the production group because they are responsible for
actually passing the Si wafers through the production line and growing or depositing or implant-
ing the chip design onto them. The production group is the place where money is made or lost.
The design group is responsible for designing the circuits that are incorporated onto the chip.
They think primarily at the circuit level, meaning that to this group the active and passive de-
vices are dened by the nominal characteristics and the variation therein, so the emphasis is more
mathematical than physical.
There is a third group that is mostly invisible to the general public and this is the process
characterization group. When chip equipment manufacturer release a new machine the decision
of whether to buy it or not is a management decision but once it is purchased its output needs to
be characterized so you can identify what the machine is capable of and this is done by the process
characterization group. Under this umbrella is device characterization which we will discuss here.
5.2 Test equipment used
Figure 5.1: A probe station.
41
The probe station that you use to test a wafer usually looks something like gure 5.1. In the
center is the wafer chuck with a small heater under it. The stem of the chuck is connected to an
X-Y-Z positioner which has vernier screws down to 0.1 m or so. The wafer to be tested is placed
on the chuck and a vacuum suction is used to hold it rmly on the chuck. The chuck can be raised
and lowered by an adjustable amount by the use of a lever.
Above the chuck and surrounding it is the probe platform that you clamp the probes onto.
The probes are usually hard mounted onto a probe card for rigidity but in the case of analog or
RF testing you could also use single probes which are long needle points with their own X-Y-
Z positioners. Above the wafer is the microscope that allows you to look at the wafer you are
probing.
Nowadays you also have the option of using a video camera mounted in place of the eyepiece
so that you can simply see down the microscope by looking at a video monitor. Surrounding
everything is the RF enclosure which acts like a screen room upto microwave frequencies.
Needles
Figure 5.2: A probe card.
The wafer probe one normally uses is basically a rectangular printed circuit board with a round
hole in the center of it. The back edge of the board is striped with the edge connector metal to allow
the card to be inserted into an edge connector that is screwed onto the probe platform as shown
at the left of the gure 5.2.
Figure 5.3: Looking at the wafer through the microscope.
Surrounding the hole in the card are metal needles which may be mounted by through hole
soldered connections. These needles are usually of metal of low resistivity and as low a thermal
expansion as possible. They are not all of the same length as shown in the gure 5.2 and are angled
42
downward by as much as a half cm. The needles tips are usually arranged to form a rectangular
array which coincides with a matching array of contact pads on the wafer.
Looking down through the microscope one will usually see what appear to be a sea of pads
as in the gure 5.3 and also the visible portions of the silicon devices and their interconnects and
connections to the contact pads.
The parametric analyzer is the measurement unit that can take all the measurements required
to characterize most semiconductor devices such as transistors, diodes and resistors etc. Capaci-
tance measurements are made using the C-V meter and if the measurements need to be made us-
ing sinusoidal inputs and outputs, the equipment used is the network analyzer. Hewlett Packard
equipment is the most popular choice among device engineers.
The primary behavior of a parametric analyzer is just to generate tables. Suppose the supply
voltage of your circuit is V
s
, then typically the supply voltage may be 10% higher. Both FETs
and BJTs have three terminals and the FET also has the body contact. So in a real life usage of
the device any terminal may potentially have any potential and so you basically need all possible
combinations of terminal bias. But of course we dont actually need all combinations, instead there
is a method to it as we discuss below.
5.3 Test circuit layout
The circuits used for direct current measurements are usually distinct from those used for al-
ternating current measurements. If they were interchanged the measurements would not function
correctly. In addition there are usually far more test circuits to characterize FETs as opposed to
BJTs.
As you see in the chapter on process skew, the variation in the effective gate length is a sig-
nicant fraction of the minimum gate length. In addition as you will see in the analog section,
longer gate lengths may be used in circuits that are sensitive to the length variation or which need
a higher driving point impedance.
Usage Length Width
Minimum 0.06 m 1 m
2nd order 0.09 m 1 m
1st order 0.20 m 1 m
Long channel 0.75 m 1 m
Short & narrow 0.06 m 0.2 m
Table 5.1: Five transistors needed.
For these reasons the DC section for each FET usually contains an absolute minimum of ve
transistors, often even a few more. For example if the minimum gate length that can be drawn
is 0.06 m and the minimum gate width is 0.2 m, then the devices chosen may appear as in the
table 5.1.
The gate length dependence is usually second order in most FET models so you need the rst
three devices to t the 1st order and the 2nd order dependence. The fourth device is the long
channel FET which is used to calculate the threshold voltage and is also needed to do the skewing
of the threshold voltage i.e. the long channel FET is the one which is measured by the production
line monitoring of the V
th
because it is independent of the variation of L
diff
and will give the
variation solely due to the threshold adjust implants. The last device is to t the short and narrow
effects [34], [35].
43
Often the gates are connected together and to a single contact pad which is used when testing
any of the devices and if you dont need to get the reverse characteristics, you could connect
all the sources to a common pad as well. Similarly several substrate connections can be placed
near all the devices and connected to a common pad, so that nally only the drains usually have
independent contact pads of their own.
There are two main components that make up the total gate capacitance. In the gure 5.4
there is the component of the gate capacitance between the gate and the body and this depends
upon the area W L. The other component is the capacitance of the gate to the source and drain
junctions. The second capacitance depends primarily on the width W.
W
L
Figure 5.4: Gate capacitances.
In addition there is a capacitance that is signicant for narrow devices which is an effective
"gate extension". This is the capacitance from the gate to the region of the body on either side of
the gate along the W direction. However this is often ignored because the error is small for all
devices except the narrow devices which are rarely used anyway.
So we only have two basic components to separate. So we need two different structures to
separate the effect of the two components. So we would need to write an equation for each struc-
ture as shown below and solve the two equations together to separate the contributions of the two
components.
C(W
1
, L
1
) = [2 W
1
C
w
] + [W
1
L
1
C
a
] (5.1)
C(W
2
, L
2
) = [2 W
2
C
w
] + [W
2
L
2
C
a
] (5.2)
So after measuring the capacitances of the two structures, we use Cramers substitution to
extract C
a
and C
w
.
In general BJTs are characterized on a per structure basis i.e. each size of BJT has its own
model. For this reason, most of the devices to be used are placed in the test circuits.
5.4 Measurements
5.4.1 Drain characteristics
The typical drain characteristic appear as shown in the the gure 5.5. As you watch the screen
of the parametric analyzer you are looking for the square law behavior just to assure yourself
that the measurements are OK. So for the drain characteristics, you step the gate voltage from
subthreshold to supply and sweep the drain. Different engineers look for different gate voltages
to use but my philosophy has always been "the more the merrier" and in fact I like to have a few
drain sweeps in the subthreshold even though they are essentially meaningless for a drain sweep.
44
1
4
9
16
Vgs = 4
Vgs = 1
Vgs = 2
Vgs = 3
I
Vds
Figure 5.5: FET drain characteristics.
5.4.2 Gate characteristics
The typical gate characteristic appear as shown in the the gure 5.6. As you watch the screen
of the parametric analyzer you want a straight line steeply rising until the V
g
gets close to the
threshold voltage and then it starts to atten out. The long steep rise is only important because
when you t the model to the measurement, if the models threshold calculation is incorrect, then
the model predicted current will run parallel to the measured curve in this region. But of course
most of the design work is done in the region above the threshold voltage, and the subthreshold
is important primarily for leakage current and for low power circuits.
6
3
9
12
Vd = 0.05 V
Vd = supply
Figure 5.6: FET gate characteristics.
If you look at the gate characteristics in a linear plot as opposed to a log plot then essentially
you will not see the sub threshold at all and the plot would appear as shown in the gure 5.7.
The threshold voltage is calculated fromthe gate characteristic at a drain voltage of 50 mV. The
point of maximum slope is determined and the slope extrapolated to the x axis, and then the 50
mV is subtracted from the intercept. In order to get a good result you want the spacing of the gate
voltage to be ne and you want to use a 5 point derivative.
5.4.3 Back bias
The body of NFETs is usually connected to ground whereas the body of PFETs is connected to
supply. But consider the case of the upper NFET in a two input NAND gate, its source is at the
drain potential of the NFET below it and it could be considerably above ground. To allow proper
45
Figure 5.7: FET gate characteristics.
simulation of all circuits, the gate and drain measurements are repeated with a back bias which
is the body to source voltage. It is negative for NFETs and positive for PFETs and can reach a
maximum of supply voltage.
Typically it is enough to measure at two back bias voltages to t essentially a quadratic de-
pendency, I prefer a third and two third supply. For the most part the back bias will shift the gate
characteristics to the right, but there is a cross term dependency between drain and body voltages
because they essentially compete at the same diodes. The curves you expect to see are as shown
in the gure 5.8 where the dashed line shows the case without back bias.
6
3
9
12
Figure 5.8: FET gate back bias characteristics.
5.4.4 Collector characteristics
The collector characteristics are obtained by stepping the base voltage and sweeping the col-
lector voltage as shown in the gure 5.9. If the collector voltage is lower than the base voltage the
BJT is in saturation and if it is higher than the base voltage the BJT is in active mode.
Due to the Early effect the collector current rises with an increase in the collector voltage and
the intercept of this slope on the x axis is the Early voltage as shown in the gure 4.9.
46
Active
Saturated
Vce
Ic
Figure 5.9: The collector characteristics.
5.4.5 Diode characteristics
The gure 5.10 shows the diode characteristic. The dotted line shows the exponential rise you
are expected to see using the ideal diode equation 5.3
I = I
s
(e
qV/nkT
1) (5.3)
As the current rises the bulk resistances in the diode share some of the applied voltage and the
current rises less than exponentially. The ideal diode equation uses n = 1 but in real life n 1.5.
V
I
Is
Figure 5.10: The diode I-V characteristic.
5.4.6 Reverse characteristics
In some fabrication processes the source and drain implants are different fromeach other while
in others they are identical. If the drain is different fromthe source, then the FET will have different
characteristics when the source is used as the drain and the drain as the source. It is a rare circuit
which allows the source and drain to be reversed so usually one does not characterize the FET in
reverse mode.
47
Similarly if you have a situation where the emitter of a BJT may be interchanged with the col-
lector, you denitely need to use a different model for that case because the emitter and collector
are completely different as shown in the gure 4.1. For BJTs one usually does make models for
both forward and reverse cases.
5.4.7 S parameter measurement
Figure 5.11: S parameter measurement.
BJTs are often used in circuits operating at several GHz. So they are also characterized using
measurements made at frequencies ranging from a few hundred MHz upto 10 GHz or so. The
equipment that makes these measurements is called a network analyzer. Two 50 cables connect
the network analyzer ports to the BJT as shown in the gure 5.11.
The probe that is used to make these measurements is a 50 coplanar waveguide mounted
on a ceramic substrate. The tip that touches down has two contacts with the lower one (in this
case) being the common ground and the upper the signal. The signal contains both DC and AC
components.
The DC component on the left sets the V
be
while the DC component on the right sets the V
ce
of the bias point. The AC component on the left is the input sinusoidal signal, and the collector
on the right drives the amplied AC component onto the probe on the right and into the network
analyzer.
The network analyzer provides the input sinusoidal signal by superimposing it onto the DC
input bias, and it isolates and measures the output sinusoidal signal driven by the collector. It tab-
ulates the input amplitude and phase and the output amplitude and phase. Fromthis information
the BJT is characterized.
5.4.8 C-V measurement
FET gate capacitance
The gure 5.12 shows a typical C-V curve obtained when measuring the capacitance of a FET
gate. Initially all you see in the capacitance of the gate to the body and from the gate to the source
and drain regions. As the gate voltage is increased past the onset of inversion, a thin layer of
charge starts to form under the gate connected to the source. Since the charge is immediately
under the gate oxide instead of in the bulk of the device the capacitance starts to rise. Initially the
region closer to the drain is not yet inverted, but as the gate voltage increases the inverted layer
stretches from the source to the drain and also the inversion is more complete so the charge sheet
is right under the gate oxide and thus the capacitance reaches its maximum.
After this maximum is reached, any increase in gate voltage has to result in an increase in
charge but the gate only has so much charge density per unit volume, so the charge on the gate
starts to extend upward to higher layers of the gate, thereby separating the plates of the capacitor,
thereby reducing capacitance which is why you see the capacitance drop as the gate voltage tends
toward the supply voltage.
48
Vth
Figure 5.12: Inversion capacitance of the gate.
Diode junction capacitance
Diode junction capacitance is a little difcult to measure because of the direct current that
ows as the bias of the junction is made more positive. In the gure 5.13 you see that as the diode
is progressively more forward biased, the width of the depletion region drops and the measured
capacitance increases.
C
V
Figure 5.13: Diode C-V measurement.
The way that the measurement is done is that a DC bias is applied across the diode and a
sinusoidal voltage usually at about 160 kHz is superimposed over the DC bias and the current
ow due to the sinusoidal signal is separated from the direct current by the use of an isolating
capacitor. So the circuit whose capacitance you are trying to measure is on the right of gure 5.13
but since the series resistance of the diode drops a lot faster than the capacitance rises so the RC
time constant drops very quickly. In addition, the direct current has noise in it, so as the direct
current rises noise becomes a problem.
The only way to reduce the direct current is to reduce the area of the diode and this would
also reduce the capacitance you are trying to measure. So the bottom line is that measuring the
capacitance of a diode is difcult and so the C-V measurement of a diode usually only extends to
just before the diode turns on.
5.4.9 Thermal behavior
The most stringent temperature requirements for chips are military requirements and those are
usually that the chip operate over a range of 50
o
C to +125
o
C. Normally the low temperature
measurements are done at room temperature. The only other measurements are taken at about
+100
o
C.
49
5.5 Production monitors
A production line is kind of like those old cars with carburettors, you have to keep tuning
it periodically. In addition you cannot duplicate production volume anywhere else, so there are
some tests which can only be done in production and nowhere else.
The wafers that go through the production line usually have several microns space between
the chips because this is where the chips are split apart. But until the wafer is broken up, this
is perfectly good space to put test circuits and so production monitors are usually placed here
because it does not cost anything. These are called scribe line monitors.
Basically any test which can be automated and can be performed in a very short time can be
a production monitor. A few tests are almost invariably chosen. The gate oxide thickness is very
important to monitor because it determines the reliability of the gate oxide and also determines
the loading of the gates. The drain current at maximumgate and drain voltage is another measure
that gives you a good indication of whether the chip is running toward the fast or slow side. The
threshold voltages of the FETs determines the off state currents and the stand by current of the
chip.
5.6 Scanning Electron Microscopy
If you impinge highly energetic electrons on a semiconductor surface and vary the angle, they
will penetrate to several microns and get diffracted out and by measuring the spatial and angular
distribution of the diffracted electrons you can tell what the semiconductor is made of, so you can
get impurity concentration proles or oxide proles etc.
The reason SEM is important is that it is not just another electrical test but is a physical test so
it is used as an independent conrmation of a prole that you obtained from a simulation or that
you estimated based on electrical measurements and so forth.
5.7 Striped wafers
Stepper for wafers [41]. Suppose the chip you are manufacturing is 1.67" on the side including
the scribe lines. On a 12" wafer you can t 6 columns for a total of 32 chips as shown in the gure
5.14.
1
5 10
13
17 18 19 21 22
23 24 25 26 27 28
29 30 31 32
2 3 4
9 8 7 6
16 15 14 12 11
20
Figure 5.14: Chips on a wafer.
50
In the case of the wafer of gure 5.14, the mask reticle is stepped sequentially across the wafer
32 times so that all 32 of the chips are dened. Striping is a way to take advantage of this fact to
obtain more information from a test wafer during process development.
During the step when you performthe channel implant, if you only performthe resist exposure
on a one column say chips 1,6,12, 18,24 and 29, and do an implant and then repeat the process
using a different column and a different implant, then you can have 6 columns with 6 different V
th
voltages, but having all other processing the same.
The mask misalignment variation between one chip and another, the exposure time etc are
variable but factors which affect the current such as source/drain implants and the sub-diffusion
will be common across the wafer. So, for the most part the difference you see between chips in
one column and the next will be due to the different channel implant that you used.
V
th
is only one example, there are many process parameters that can be varied between columns.
So although it is more time consuming and hence expensive, striping is a way to obtain valuable
information during process development, and more importantly it is a way to make a decision
as to the process recipe you will use for the rst pass of your design, in a timely manner without
many iterations. The time saved is the biggest issue.
5.8 Noise
There are three kinds of noise normally characterized in semiconductor devices namely ther-
mal (Johnson) noise, shot noise and 1/f noise. Thermal noise occurs in resistive material so the
bulk regions of a device will generate thermal noise. The Nyquist equation for the noise spectrum
is given in the equation 5.4 where R is the resistance, T is the temperature, k is the Boltzmann
constant and is the frequency over which the noise is measured.
'V
2
` = 4kTR (5.4)
Shot noise occurs in a p-n junction. The equation used for the shot noise in diodes and bipolar
junction transistors is given by the equation 5.5 from [42], where I
D
is the average value of the
current ow and f is the frequency over which the measurement is made.
i
2
= 2qI
D
f (5.5)
Output

+
Figure 5.15: The 1/f noise measurement setup of [43].
1/f noise occurs in FETs. Some papers discussing the measurement of noise in FETs are [44],
[45], [46], [47] and [43]. The causes of 1/f noise are not fully understood but the noise power has
51
been measured. The setup used by [43] to measure the 1/f noise is shown in the gure 5.15. At a
given bias point the noise voltage from the FET is directly amplied and analyzed in a spectrum
analyzer and will show a linear reduction with increase in frequency.
5.9 Process skew
In the gure 5.16 you see what happens to an FET when it is processed. The design length that
you draw on the mask is the L
drawn
, but when you dene the gate using lithography and etch it,
the length you get will be shorter than the drawn length and is shown in the gure 5.16 as L
gate
.
After the gate is dened, the source and drain are implanted and annealed, but due to the thermal
cycles the wafer goes through after the implants, the implants spread in all directions.
Lgate
Ldrawn
subdiffusion
Leff
Figure 5.16: Drawn and effective gate length.
Toward the top they are constrained by the interface, but they are not constrained laterally and
so they spread under the gate. The extent to which they spread under the gate is the sub-diffusion
length which brings the source and drain closer together hence reducing the effective gate length.
Variation in gate length is the rst most important cause of process skew.
The threshold voltage is lowered by the sub-diffusion and so an additional implant is used to
adjust the threshold voltage. Remember that the source and drain are of the opposite type to the
substrate that you implant them in, so when the source dopant migrates under the gate, it pulls
the substrate closer to intrinsic, which means the threshold voltage reduces for either PFET or
NFET.
In general, the more quantities that you add together, which compensate for each other, the
more variation you will have in the end result and the threshold voltage is no exception, and it
does vary. After the L
eff
variation, the threshold voltage variation is the next most critical.
Gate oxide usually varies at least 10% probably 15%. For example a gate oxide of a nominal
thickness of 20 would vary between 17 and 23 in units of 2.8 . If the oxide is on the
thin side the electric eld across the oxide increases. If the eld exceeds the dielectric strength the
oxide can breakdown.
At 17 there is another problem other than dielectric strength which is tunneling. Using
Schroedingers wave equation the wave function of the electrons in the gate and the channel ex-
tends across the gate oxide which means there is a reasonable probability they can tunnel through
the gate oxide.
If the oxide is on the thick side the inversion of the channel may not be complete. So the
threshold voltage rises and the maximum current the FET can conduct at supply voltage reduces.
52
Due to the variation in implantation, the source and drain junctions vary in depth about the
nominal. A deeper source and drain than nominal would increase the probability of punchthrough
where the source and drain depletions come into contact causing high leakage current and making
the FETs inoperable.
But even if this does not occur, the variation in source and drain junction depth would modify
the eld in the channel region and thus the current ow. If the source and drain junctions are too
shallow, the FETs would have trouble turning on and the current would be low. In any case since
most source and drains are made using multiple implants the relative positions of the implants
would vary causing a variation in current.
The junction area depends on the junction depth and when that varies so too does the the
junction capacitances. This varies the load seen by logic gates since the drains typically need
to vary between supply and ground as the logic is evaluated. So any extra capacitance linearly
increases current consumption and increases the charge times almost linearly as well.
The region between the contact and the edge of the channel in a source and drain as shown in
the gure 5.17 is resistive as all semiconductor is. This can vary due to the variation in the sheet
resistance of the source and drain regions but it can also vary due to variation in the distance from
the contact to the edge of the channel.
Figure 5.17: Source resistance.
The source resistance is dened on a per unit width basis. If the minimum gate length device
with a width of 1 conducts 1 mA then the source resistance needs to be much less than 100 .
Where the metal connection is in contact with the Si of the source or drain it creates a Schottky
contact which is a diode and thus has rectication properties and a potential barrier both of which
are undesirable in a source or drain contact.
The ohmic contact is a way of connecting a metal to either p or n type Si without creating
a diode. The silicided contact we discussed previously is one method of achieving this goal. No
matter howyou do it this contact region has a signicant resistance. You can reduce this resistance
by increasing the number of contacts.
Assuming two contacts per gate width, using the same logic as in the case of the source
resistance, each contact needs to be much less than 50 .
The interconnect lines vary in width due to the lithography process. This line width variation
causes a variation in the series resistance as well as the line to line capacitance. The resistance
increase is the bigger issue and so interconnect width variation is monitored.
If parameters that contribute to a total skewhave some relationship to each other then they are
not independent and this could be either good news or bad news.
The best example is the relationship between the load due to gate capacitance and the current
driven by the FET. If the T
ox
is lower than it should be the load capacitance due to the FETs gates
increases, but at the same time the inversion in the FETs channels reduces and therefore they drive
more current, so these two effects partially compensate each other which is good news.
Similarly if the sub-diffusion becomes larger the effective gate length decreases, but the source
resistance increases due to the increase in distance from the contact to the edge of the channel. So
here again there is some compensation which is also good.
53
Process space is usually depicted as shown in the gure 5.18. The quantity on the axis is speed.
The fact that the major axis of the ellipse lies along the diagonal can be interpreted to mean that
there tends to be some correlation between P and N. For example if the gate oxide is running on
the thin side, it would probably cause both the P and the N FETs to have thinner gate oxide than
normal etc.
P
N
fs
sf
ss
ff
Figure 5.18: Process space.
This is also known as the Gaussian distribution and is given by the relationship
y =
1

2
e
(xxm)
2
/(2)
(5.6)
Most natural phenomenon tend to have a Gaussian distribution. The two parameters that
dene a Gaussian are the mean and the variance . The integral of the distribution from 3 to
+3 is larger than 99%.
Metrics that are normally monitored for an FET are modeled as normal distributions. For
example the threshold voltage variation is modeled with a mean and a variance. T
ox
cannot be
Gaussian because the oxide grows in mono-layers 2.8 thick. The sub-diffusion is Gaussian.
3 analysis is a very conservative method of analyzing the yield of a circuit. In general it is
incorrect because here you model the slow 3 corner with the +3 V
th
, the +3 T
ox
, the +3 gate
length etc., though the probability of all of them occurring at the same time is remote. Given that
the yield for most large circuits rarely exceeds 80%, this slow corner will denitely not make the
cut.
If your circuit has only PFETs and NFETs the corners that are analyzed may be slow-slow,
slow-fast, fast-slow and fast-fast to cover all the possible problems that could occur. The slow-
slow corner will simply expose the regions of the circuit that do not make the cut for speed. The
slow-fast and the fast-slow will expose the problems when the cross-over voltage is low and high
respectively.
The fast-fast corner will expose problems such as race conditions when the output in a portion
of the circuit is incorrect because one of the signals used in the evaluation arrived too early. This
corner is also where the leakage is high and the power consumption is high and hence the heating
is excessive causing the chip to essentially burn up.
5.10 Burn in testing
Burn in is a kind of stress testing. It is a substitute for lifetime testing. It is a proven fact that the
mean time to failure measured when operating the chip at an elevated temperature has a strong
54
correlation to the mean time to failure when operating under normal conditions. So if the chip is
tested at an ambient of 125
o
C for two days it is equivalent to testing the chip continuously for
perhaps a month.
The key is not to test the lifetime of a chip but rather to ush out possible defects. Since chip
testers are expensive you would not try heating the DUT to high temperatures, but rather you
would create a small regulated oven within which perhaps ten chips are placed on a PCB with
lines for supply, ground, system clock and perhaps even a few signal lines to apply simple tests.
After running the chips in this fashion for a few days the chip can be removed from the oven and
tested normally on a chip tester.
5.11 Ion implant to create connections
Sometimes during the rst pass of a design there may be errors in the interconnects, for ex-
ample a via may be missing. So during the testing of the prototype the chip may not function
correctly and this error may be discovered. But that does not mean that this is the only error in
the interconnections or that if this via were in place the circuit would function correctly perhaps
because the transistor sizing is incorrect.
So the prototype needs to be xed and the testing needs to be continued and the other errors
identied. Ion implantation is one way this can be achieved. This is especially easy if all that is
missing is a via. Since highly doped silicon is very conductive the area which should have had a
via is implanted at exactly the correct energy causing a large concentration of donors to be present
between the metalization of the interconnects that were to be connected by the via. Although the
resistance may be different from the via the connection is established and testing can continue. If
a short section of interconnect is missing this may be more difcult to x because then the implant
would have to laid as a path but sometimes even that can be done.
5.12 Thermal imaging
Occasionally it does happen that portions of the chip are malfunctioning and you cannot gure
out why. If overheating is suspected as a cause you can run simulations to generate a map of heat
generation vs. position in the chip but it would be a little difcult because the extracted netlist
does not have the position of the FETs and also circuit simulators do not solve the heat equation.
Just as liquid crystal displays change opacity with applied voltage there are organic com-
pounds which change color with heat. So sometimes it does happen that the top of a packaged
chip is opened up and such a liquid or gel is introduced on top of the chip surface and the chip is
tested for functionality while being observed under a microscope. The colors that you observe are
real not simulated so you can obtain experimental evidence of the heat patterns.
55
Chapter 6
Chip fabrication
6.1 Wafer preparation
The very rst step in building a microchip is having a clean wafer to build it on. Si is a crystal
and crystals are grown. Sand is mostly Si with some impurities thrown in, so the starting point is
to melt sand and remove all the impurities from it.
Figure 6.1: Liquid Czochralski pull.
The Liquid Czochralski pull is a way to create Si ingots which can be sliced up into wafers
much like salami. The wafers used nowadays are 12 inches in diameter but previous generations
were 8 inches and 6 inches respectively. One ingot of 12 inch diameter may have a retail value of
a 100k or so. The cleaned out and molten Si is called the Si slurry.
In the gure 6.1 the vat contains the slurry which is maintained in a molten state. A rod with
a piece of pure silicon attached to its tip is lowered until the silicon seed is just in contact with the
slurry and then it is lifted and rotated at a special rate
The seed causes the slurry that adheres to it to align to the crystal symmetry and as it is lifted
the new material added to the seed cools and solidies into a crystal layer. The width of the seed
keeps increasing until it reaches a maximum width which is determined by the rotation and lift
rate. You could guess that rotating the seed rapidly would reduce the diameter of the pulled ingot
due to the shear forces at the edges.
After the cutting process the wafers are not of even width and they have a rough uneven
56
surface unsuitable for chip growth. So they are polished in a diamond paste slurry. They are placed
face down on a table about 6 foot round which has a wet slurry of ne diamond chips and silicon.
The table is made to rotate back and forth and as it does so the wafers slide on the slurry and
the surface of the wafers is ground to a ne sheen. Both sides are ground although only one side is
the actual surface to be used so may be polished more nely. Then they are removed and bathed
and dried. To remove damage caused during the polishing, the surface is etched [48].
The FETs on the wafer are created by implanting impurities into the surface of the wafer so the
surface has to be absolutely at and absolutely crystalline and so an epitaxial layer is grown on
the raw wafer [49], [50]. before any further processing is done. One bonus is that you can dope
this layer to have exactly the right amount of dopant that you want in the chip substrate.
There are many types of epitaxy such as liquid phase epitaxy (LPE), vapor phase epitaxy (VPE)
or molecular beam epitaxy (MBE). MBE is the slowest, requires the most amount of work to main-
tain the machine and gives the best results. It is too expensive to be used to grow the substrate,
whereas VPE is more popular.
6.2 Lithography
Resist is a gel like uid that is used in lithography [51]. Usually a blob of resist is placed at the
center of the wafer and it is spun rapidly causing the gel to spread outward and cover the entire
wafer. The surface tension causes it to stop at the edge and even out. Then it is heated in an oven
to harden it.
Then a mask is placed over it and aligned to the wafer and the wafer is exposed to deep
ultraviolet light. The light is absorbed by the exposed surface and causes a chemical reaction to
occur. There are two types of resists positive and negative. The rst type hardens in the presence
of the UV light and becomes impervious to specic etchants. The second becomes susceptible to
certain etchants.
Mask
Wafer
Resist
Coherent light
source
Figure 6.2: The lithography process.
The process of creating etch masks using resist layers and UV light is called lithography as
shown in the gure 6.2. It used to be that the minimum feature that could be dened by the UV
light used in the exposure was half a wavelength, however with the advent of phase shift masks
that has been reduced to a quarter of a wavelength.
The energy that the photoresist needs to absorb rises as the inverse of the wavelength of UV
light and it has been getting progressively more difcult to develop resist that can absorb the
photons and change chemical properties without breaking down.
The stepper [41] is used during lithography to expose each chip using the mask, so it has to
step across the wafer as shown in the gure 6.3. Each time it moves a step it aligns the cross hairs
57
Figure 6.3: Stepping across the wafer.
on the mask to the corresponding marks on the wafer so during the lithography process there is
room for misalignment and there is variation between different chips even on the same wafer.
6.3 Mask generation
For a typical process you may make no more than 40 distinct masks but you may have as many
as 150 processing steps. So many masks are used more than once and some of themare mixed and
matched to create the nal structures.
A single transistor uses many design layers in its physical structure. The source and drain
implants would be a design layer. The contacts for the source and drain are another design layer.
The gate is another design layer. Notice that sometimes design layers are responsible for more
than one material, for example the gate design layer includes both the gate oxide and the gate
poly. So a design layer is a layer as you would see it when drawing the physical layout of the chip.
The mask layers are extracted from the design layers. The extraction process achieves two
goals. Firstly it converts the design layers into the actual masking needed by the processing steps.
Secondly it seeks to reduce the number of masks needed by using masks for more than a single
step.
The only function of a mask is to lter light and light travels in straight lines. So it is possible
to combine two masks when exposing resist. For example let us suppose that you have a mask A
that outlines all the source and drain regions. In addition you have masks B and C that outline the
areas containing the PFETs and the areas containing the NFETs respectively.
So if you combine the masks A and B you only get the PFET source and drains whereas if you
combine the masks A and C you only get the NFET source and drains. There are many instances
when masks are combined as a way to reduce the total number of masks.
The best example of a very ne mask is the mask that denes the gate oxide of the FETs because
the feature size is a quarter of the wavelength used in the exposure. Such a mask is obviously
going to be very expensive because it has to be perfect and the tolerance has to be very small. This
type of mask will be expensive.
But for some masks you dont need such a small tolerance and also the features in the mask
itself are very large. For example a mask to dene a p well surrounding many NFETs will be
several microns in size and so it is a lot less critical. Some masks have features even larger than
58
that. These masks have less of a tolerance problem and can be made more cheaply and they are
called coarse masks.
6.4 Oxide growth
Oxide can be either grown or deposited. You can only grow oxide on a Si surface. Si is
consumed in the process. But you can deposit oxide even on the areas that dont have exposed
silicon. If you dont need pure oxide but just a thick layer of silica you can spin on a silica gel and
bake it into a silica layer.
Dry oxidation [52] is done by heating the wafer in the presence of oxygen and nitrogen. The
nitrogen does not participate in the reaction. Typically dry oxide is grown at about 850
o
C or higher
temperatures. The reaction is the equation 6.1.
Si + O
2
SiO
2
(6.1)
Dry oxide is the purest oxide you can grow and is used to make the gate oxide of eld-effect
transistors. Its purity gives it the maximum dielectric strength. The growth rate is the lowest
because once the uppermost layers of silicon are oxidized, the oxygen has to diffuse through the
oxide to get to the layers of silicon to react with. As the growth progresses the thickness of the
oxide increases, so the growth rate slows down.
Wet oxide [53], [54] grows much faster than dry oxide but it is not as pure. The method used
is to heat the wafer in the presence of steam. The reaction is the equation 6.2.
Si + 2H
2
O SiO
2
+ 2H
2
(6.2)
Oxide can be deposited using Chemical Vapor Deposition [55]. You could use silicon tetra-
chloride with hydrogen and carbon-di-oxide as in the equation 6.3. Or you can use silane [56] as
in the equation 6.4.
SiCl
4
+ 2CO
2
+ 2H
2
SiO
2
+ 2CO + 4HCl (6.3)
SiH
4
+ 2O
2
SiO
2
+ 2H
2
O (6.4)
6.5 Doping
Si has 4 valence electrons and has covalent bonds with four neighboring atoms. The process
of doping Si is to substitute a Si atom with another atom that has either 3 valence electrons or 5
valence electrons.
Boron has 3 valence electrons. When it is substituted for a silicon atom it cannot make bonds
with all four of its neighbors. The important result of this is that if an electron was looking for a
place to jump to, then this boron atom could accept it. Of course it wont be charge neutral any
more however the crystal as a whole would be charge neutral.
This available state to which an electron can jump is called a hole. In an energy diagram holes
oat meaning that if you uniformly dope a Si crystal with Boron the electrons in the valence band
will settle to the lowest levels of the Fermi-Dirac distribution which means the highest energy
levels in the conduction band are the vacant ones. So Si doped with Boron is p type.
Arsenic and Phosphorous both have 5 valence electrons. So if either is substituted for a Si
atom one electron is left over. Four of its electrons + the four electrons in the covalent bonds with
its four Si neighbors create an octet completing the outermost shell. The fth electron will have
a higher energy and so it is susceptible to moving around. For this reason Si doped with Arsenic
or Phosphorous is n type.
59
6.6 Implantation
The gure 6.4 shows the structure of the implantation process. On the left is a chamber to heat
and ionize the dopant, this could involve an RF voltage to cause the ionization. Then you have
two plates to accelerate and collimate the dopant ion beam. There are other structures that are
used to weed out different velocities because you need a beam of ions all of which have the same
velocity and direction.
Figure 6.4: The implantation process.
The gure 6.5 shows what happens to the implant as you increase the energy. The peak gets
deeper, but the distribution also spreads a little in depth.
Low High Medium
Figure 6.5: Implant energy.
The energy determines the depth of the peak and the depth distribution of dopants but it does
not determine the actual amount of dopants that is implanted. The dose determines that. The
dopant actually implanted is linearly dependent on the dose.
Implants are usually done at an angle of 17
o
off the vertical [57]. This is to avoid an effect called
channeling. When the ions enter the crystal they see tunnels. If the ions are angled so that they
hit the walls the implanted ions stop in the interstices of the crystal and the penetration depth is
predictable. If the ions are aligned to the tunnels they can penetrate to a very large distance. So by
experimental work the angle of 17
o
was found to give the best results.
Annealing [58] is a process step that follows an implantation step to repair the damage cause
by implantation. The way this is done is to heat the region that was implanted to a temperature
high enough that the crystalline bonds become looser and the atoms can realign themselves due
to thermal agitation and in the process the semiconductor recrystallizes.
This step is required because otherwise there will be a lot of dangling bonds or other recom-
bination traps and so the leakage current rises. There are many ways to perform annealing. The
simplest is to heat the entire wafer. There are more localized annealing methods which use a laser
to supply the energy and a mask to select which areas to heat. As a general rule engineers have
found that using a higher temperature for a shorter duration causes less diffusion of dopants than
a low temperature for a longer duration.
60
6.7 Etching
Acids based on Fluorine and Chlorine will react with Si and SiO
2
. So if you dip a Si wafer
in an acid bath the exposed portions of the wafer will be eaten away. An isotropic etch is an etch
which does not favor one direction over another i.e. it etches in all directions at the same rate. An
anisotropic etc is more directional, so if the anisotropy is 2 then it etches two units into the wafer
for every 1 unit parallel to the surface.
6.7.1 Wet etch
Awet etch is an acid bath [59]. It is isotropic. In the gure 6.6 you see what happens when you
dip a wafer in a wet etch. The top gure is before the etch starts, the second gure is the outcome
that you want. The third gure is what happens when you stop the etch when you rst etch down
to the Si surface. There is considerable SiO
2
still remaining that needs to be removed. On the
other hand, if you continue the etch until you remove the SiO
2
you want to remove, it appears as
in the fourth gure where the etch has undercut under the resist and has also created a furrow in
the Si.
SiO2
Resist
Si
SiO2
Resist
Si
SiO2
Resist
Si
SiO2
Resist
Si
Figure 6.6: An isotropic etch.
The etchants used for wet etching of Si [60] are hydrouoric and nitrous acid by the equation
6.5. The equation for wet etch of SiO
2
by hydrouoric acid [61] is given by the equation 6.6.
Aluminum can be etched with bases or acids [62] as in the equations 6.7 and 6.8.
18HF + 4HNO
3
+ 3Si 3H
2
SiF
6
+ 4NO + 8H
2
O (6.5)
SiO
2
+ 6HF H
2
SiF
6
+ 2H
2
O (6.6)
2Al + 6NaOH 2Na
3
AlO
3
+ 3H
2
(6.7)
2Al + 6HCl 2AlCl
3
+ 3H
2
(6.8)
61
6.7.2 Reactive ion etch
Areactive ion etch systemis outlined in the gure 6.7. RIE is not the most anisotropic of etches
but it is sufciently anisotropic and also sufciently cheap to be the most popular choice. Ionized
and partially ionized CF
4
gas in the RIE chamber acts as the etchant and the electric eld is a AC
high voltage applied between a plate at the back of wafers and a plate placed above it [63].
The reason for using a high frequency AC voltage is that the ions describe circles in the eld
and collide with and ionize other neutral molecules. The larger the voltage or the lower the fre-
quency, the larger the radius described by the ions.
SiO2
Resist
Si
Figure 6.7: Reactive Ion Etch equipment.
RIE equipment operate at 13.56 MHz and believe it or not this frequency is regulated by the
FCCbecause communication equipment also operates at nearby frequencies and so the RIE equip-
ment must stay within its allocated band.
6.7.3 Reactive ion beam etch
SiO2
Resist
Si
Figure 6.8: Reactive Ion Beam Etch equipment.
62
Even RIE etching is only slightly anisotropic. To get better anisotropy reactive ion beam etch-
ing is used [64]. Here the ions are accelerated in a eld perpendicular to the wafer surface and
essentially ltered so that the ions are moving in a direction perpendicular to the wafer when they
come into contact with it. Even the collision force contributes to the anisotropy.
6.8 Sputtering
Sputtering is the process used to deposit metal used in creating the interconnects. The earliest
gates of the FETs used to be deposited this way. Sputtering [65] is done as shown in the gure 6.9.
Target
Wafer chuck
Plasma

+
Figure 6.9: Sputtering a metal.
The ions in the plasma are accelerated in a high electric eld to impinge on the target which is
made of the metal to be sputtered. The collision knocks off metal atoms from the target which are
then deposited on the wafers on the wafer chuck.
6.9 Polysilicon
One does not normally deposit silicon after all, Si is a crystal and it has to be grown. Poly-
crystalline silicon or polysilicon or just poly for short is deposited Si and it is not really crystalline
[66]. It contains domains within which the Si is crystalline but the orientation of the crystal in the
different domains are not aligned with each other.
But polysilicon has an advantage when used as gate material because it does not need to be de-
posited at high temperatures like aluminum so it does not damage the gate oxide. The deposition
is done as in the equation 6.9.
SiH
4
Si + 2H
2
(6.9)
6.10 Sintering
Sintering is a way of creating a layer of material which is neither semiconductor nor metal but
is a kind of amalgam[67], [68], [69]. Such a material has advantages in creating a junction between
metal and semiconductor because it reduces the Schottky barrier. Sintering is done by sputtering
metal onto the silicon surface then heating it to perhaps 500
o
C or so to cause the metal to fuse
with the silicon.
The source and drain junctions are a big part of the series resistance of a FET when it is fully
turned on. One way to reduce the source drain resistance is to sinter them with metal and this
process is called siliciding [70]. The siliciding does not include the whole source and drain region
because you still want the source to body junction and the drain to body junctions to be semicon-
ductor junctions.
63
The gate polysilicon has a resistance problem too and if the mask used during the siliciding
includes the source and drain regions as well as the gate poly, it is called saliciding [71]. A sali-
cided gate is more effective in inverting the channel evenly especially if you have a wide gate at
minimum gate length operating at very high clock speeds.
6.11 Thermal budget constraints
There are many implant steps used in a typical chip process. These steps are evenly inter-
spersed between other steps out of a total of 150 steps or so. Some of these other steps could
involve heating the wafer to temperatures of 750
o
C or more, for example oxide growth. These
steps will cause the implanted regions to diffuse [40]. Diffused source and drains create a problem
because it could lead to punch through. In addition sources and drains are often created using
multiple carefully chosen implants. Diffusion could cause them to smear modifying the device
characteristics.
64
Chapter 7
Logic circuits
7.1 Boolean logic
Digital circuits use boolean logic. In boolean logic a signal only has one of two states. It can be
either high or low. The two states are numerically dened as 1 and 0. There are only three basic
operations used in boolean logic from which all other operations can be derived. These three
operations are and, or and invert.
Invert AND OR
Figure 7.1: The logic symbols.
1 2
0
3
Figure 7.2: A CMOS inverter.
The invert operation is the simplest operation and it is a unary operator meaning that it op-
erates on a single operand. For example a operates on a. The line over the operand indicates the
operation. If a was 1, then a is 0 and vice versa. If you have an expression under the line, then
evaluate the expression and then operate on it. The symbol for the inverter is the rst symbol in
the gure 7.1. The FET circuit for a CMOS inverter is shown in the gure 7.2.
We have already discussed the behavior of an FET and the electrical behavior of the inverter is
shown in the gure 7.3. The straight line is the input and in the gure, the supply voltage is 2V.
As the input voltage rises the output voltage falls, but not linearly.
65
Vth of NFET
Vth of PFET
0V
2V
Voltage of node 1 & 2
1
Figure 7.3: The voltage relationship of an CMOS inverter.
The current owing from the supply to ground is shown in the gure 7.4. As you can see it
rises from zero to a maximum and then falls back to zero.
0V
Vthp
Vthn
I
Figure 7.4: The iv relationship of an CMOS inverter.
The reason it does this is shown in the gure 7.5. This shows the regions of operation through
which the circuit goes through as the input voltage rises from 0 to supply voltage. Initially when
the input voltage is 0V, only the PFET is turned on. No current is owing because the NFET is
turned off. Since the PFET is on, the output voltage is clamped to supply and is 2V. As the input
voltage rises, the NFET moves into the sub-threshold region and a small amount of current ows.
Now the output voltage is that of a voltage divider. Since the resistance offered by the NFET is
much larger than that of the PFET, the output voltage is still close to the supply voltage.
As the input voltage rises still further, it increases beyond the threshold voltage of the NFET
and so it turns on. So nowboth the PFET and the NFET are on, so the current reaches a maximum.
As the input voltage increases still further the difference between the input voltage and the supply
voltage reduces to below the threshold voltage of the PFET. So now the PFET starts to turn off and
as the input voltage reaches the supply voltage the PFET is fully turned off.
The and operation is a binary operation meaning it operates on two operators. The and opera-
tion is denoted by the sign, and so the expression (a b) is the and operation on a and b. If both a
and b are 1, then the result is 1, otherwise the result is 0. If either of the operands is an expression,
66
Vth of PFET
PFET
is on
PFET & NFET
are on
NFET
is on
Vth of NFET
0V
Figure 7.5: The regions of operation of a CMOS inverter.
then evaluate the expression and substitute the result for that operand. The symbol for the and
operation is the second symbol in the gure 7.1.
NAND NOR XOR
Figure 7.6: More logic symbols.
The or operation is also a binary operation. It is denoted by the + sign, and so the expression
(a + b) is the or operation on a and b. If both a and b are 0, then the result is 0, otherwise the result
is 1. The symbol for the or operation is shown in the gure 7.1.
I
3
a
b
o
Figure 7.7: The CMOS nand gate.
The nand, nor and xor operations shown in the gure 7.6 are derived operations. The nand
operation is the and operation followed by an invert operation. In boolean logic we would say
that the nand operation on a and b is (a b), so rst do the and operation and obtain the result,
then invert that result. However, when using FETs to create logic circuits it takes fewer transistors
to make a nand gate than it does to make an and gate, so the more realistic way of looking at it is
to think of an and gate as a nand gate followed by an inverter.
The FET circuit diagramfor a nand gate is shown in the gure 7.7. The lower half of the circuit
is the series combination of NFETs and the upper part of the circuit is the two PFETs in parallel. If
67
both the inputs are 1, then both the NFETs are turned on whereas both the PFETs are turned off.
So in this case the output is 0. For any other combination of inputs the output is 1.
I
o
a b c
a
b
c
Supply
V2
V1
0
V3
Figure 7.8: Three input CMOS nand gate.
The case when you have three inputs is shown in the gure 7.8. For most processes three is
probably a reasonable limit to the number of inputs. In the gure the voltages V1, V2 and V3 lie
between supply voltage and ground. Because there are four FETS in series between the supply
and ground, they have to share the supply voltage. In the case of the nand gate the PFETs are
not affected, however the three NFETs are affected by having to share the supply voltage in this
manner. Of the three NFETs the lowermost one is the least affected. All that happens to it is that
its drain voltage is not as high as it could be.
a
b
o
0
3
Current
flow
Figure 7.9: The CMOS nor gate.
For the NFET in the middle, the body of the FET is still at ground, but its source is at V1 which
is higher than ground. Effectively it sees a back bias. This back bias will prevent it from turning
on as easily as the lowermost NFET. The topmost NFET sees the largest back bias. For this reason
three input gates dont use the same sizing for all three NFETs. Instead the topmost NFET is the
widest and the lowermost NFET is about normal size. This in turn means that the signal that has
68
to charge the gate of the topmost NFET sees a larger gate area. The net result is that if you use a
three input gate, you need to drive the topmost NFET with the strongest signal of the three inputs.
The nor operation is the or operation followed by an invert operation. In boolean logic we
would say that the nor operation on a and b is (a + b), so rst do the or operation and obtain the
result, then invert that result.
The FET circuit diagram for a nor gate is shown in the gure 7.9. In this case you see that only
if both the inputs are 0, the output is 1. For any other combination of inputs the output is 0.
The exclusive or operation or the xor operation for short is different from the or operation in
only one instance. If both a and b are 1, the or operation yields a 1, however the xor operation
yields a 0. The symbol for the xor is shown in the gure 7.6. The xor operation on a and b is given
by (a b + a b) So it seems that the xor operation requires two inverters, two and gates and one
or gate which makes it a fairly large circuit for a single gate.
A + A = 1 (7.1)
A + 1 = 1 (7.2)
A 0 = 0 (7.3)
Commutative property:
A B = B A (7.4)
Associative property:
A B + A C = A (B +C) (7.5)
De Morgans theorem:
A+B = A B (7.6)
A B = A+B (7.7)
A truth table is a way of dening the behavior of an arbitrary logic circuit. It is just a table that
has a column for each signal. Some of the signals may be inputs and some of themmay be outputs.
So if you have inputs a, b and c and you have outputs d and e, then you would have ve columns
in the table. For each combination of a, b and c you would put down the value of the outputs d
and e. Keep in mind that the table need not be complete i.e. you dont have to have a row for each
combination of a, b and c, just enough rows to dene the states you need. A consolidated truth
table showing the output of all the binary operators is shown in the table 7.1
Input1 Input2 and or nand nor xor
0 0 0 0 1 1 0
0 1 0 1 1 0 1
1 1 1 1 0 0 0
1 0 0 1 1 0 1
Table 7.1: Truth table for the binary gates.
If you add a clock to combinational logic you get sequential logic. In sequential logic the
output can be a function of the output during the previous clock cycle. You can store the values in
ip-ops and use them as needed.
69
7.2 Flip-ops
The simplest ip-op (and the most popular) is the D ip-op. The logic diagram for a D ip-
op is shown in the gure 7.10. You can buy a discreet logic chip containing 5 D ip ops in a 20
pin package for probably 50 .
Q
Q
D
Clk
#1
#2
#3
#4
b
a
Figure 7.10: Logic diagram for a D ip-op.
Input
Clock
Output
Output
Clock pulse
Start reading D
Start writing Q
Q and Q are ready
Q
Q
D
Figure 7.11: A D ip-op.
The way a D ip-op functions is shown in gure 7.11. On the rising edge of the clock pulse
the value at D is read in and at the falling edge of the clock pulse it is written to Q and its inverse
written to Q. When the clock is low a = 1 and b = 1. The two nand gates #1 and #2 form a
feedback loop and the value at Q is held constant.
When a clock pulse arrives a 0 within 1 gate delay, and within the same delay b D.
Now, regardless of what value Qhad, it now goes to 1 within an additional gate delay. Now, since
b = D, Q D. At this time Q is still 1. However when the Clk goes low a 1 and b 1, and the
value of Q is set.
7.3 The pass gate
The pass-gate shown in the gure 7.13 is a switch. The pass-gate uses a back to back NFET
and PFET pair to conduct a signal because an NFET conducts a good 0 but can conduct only up to
(V dd V t) as shown in the gure 7.12 where V dd is the supply voltage and V t is the threshold
voltage of the NFET so they conduct a 1 only poorly i.e. the 1 that is conducted for the case when
the supply voltage is 2.5 volts and the threshold voltage is 0.5 volts is only 2.0 volts and not 2.5
volts as required. This is because when the source of the gate is raised to (V dd V t), the NFET
turns off.
70
Similarly the PFET conducts a good 1 but a poor 0 because the PFET will turn off when the
source voltage is reduced to the threshold voltage of the PFET. So for the case when the supply
voltage is 2.5 volts and the threshold voltage is 0.5 volts, the 0 that is conducted is 0.5 volts and
not 0 volts as required. So, by using a back to back NFET and PFET pair, with a 0 on the PFET gate
and 1 on the NFET gate, any signal either 0 or 1 can be conducted properly.
Vdd Vth
Vdd
Vdd
Figure 7.12: An NFET conducting a 1.
C = 1, Cb = 0 ==> High impedance
C = 0, Cb = 1 ==> Conduction
Cb
C
Figure 7.13: A pass gate.
7.4 Karnaugh maps
10
11
01
00
00 01 11 10
1 x x
1 1 1
ab
cd
1
Figure 7.14: A Karnaugh map.
The purpose of a Karnaugh map is to help you extract the minimum logic that satises the
truth table you have dened for a circuit. A sample Karnaugh map for a logic circuit containing
four inputs a, b, c and d and one output is shown in the gure 7.14. The four combinations of
signals a and b are listed on the left and the four combinations of signals c and d are listed across
71
the top. Notice that in these combinations only one bit changes at a time as you go from top to
bottomand fromleft to right. Within the map, you place a 1 for each rowand column combination
that should output a 1. If you leave it blank it means a 0. If you specically dont care what the
output is for than row column combination you can place an x which can be used as either a 0 or
a 1.
Now you start grouping the 1s together. The grouping has to be rectangular in shape and
you can wrap around the edge of the table i.e. the right edge continues on the left edge and the
top edge continues on the bottom edge. Any element can appear in multiple groups and the goal
is to create the largest groupings possible. Each group that you dene will yield a term in the
nal boolean expression. The larger the grouping the smaller the term. If you use an x within a
grouping you are assigning a 1 as the output for that row column combination. If an x is not used
in any grouping the output for that row column combination will be 0. For the Karnaugh map we
have dened the expression yielded is given by:
a c + b c d (7.8)
7.5 Finite state machines
000
001
011
0/0
1/0
0/0
0/0
0/0
1/1
111
110
1/0 1/0
1/0
0/0
Figure 7.15: A nite state machine.
Designing nite state machines is fundamental to digital circuit design. The key feature of a
nite state machine is dening a set of states which a system could be in. Without thinking in
terms of nite state machines, any sequential logic would become difcult to manage as you add
more gates.
Let us design an FSM to read the pattern 01011. So a stream of bits are coming in and if it
sees this pattern, it outputs a 1 otherwise the output is 0. The state machine will look as shown
in the gure 7.15. It has ve states and one output. In this diagram the arrows are labeled as
input/output. Notice that when picking the sequence designating the ve states we picked them
so that only one bit changes at a time, this we have a good chance of getting a logic that is minimal.
The implementation of the FSM is shown in the gure 7.16. The state of the FSM is abc. To
analyze this diagram use the input 00110101011 as shown in the table 7.2. The state abc is after
72
I abc Output
0 001 0
0 001 0
1 011 0
1 000 0
0 001 0
1 011 0
0 111 0
1 110 0
0 111 0
1 110 0
1 000 1
Table 7.2: FSM states for 00110101011 input.
the input I is received. The truth table for the FSM of gure 7.17 is shown in table 7.3.
a
n1
b
n1
c
n1
I O a
n
b
n
c
n
0 0 0 0 0 0 0 1
0 0 0 1 0 0 0 0
0 0 1 0 0 0 0 1
0 0 1 1 0 0 1 1
0 1 0 0 0 0 0 0
0 1 0 1 0 0 0 0
0 1 1 0 0 1 1 1
0 1 1 1 0 0 0 0
1 0 0 0 0 0 0 0
1 0 0 1 0 0 0 0
1 0 1 0 0 0 0 0
1 0 1 1 0 0 0 0
1 1 0 0 0 1 1 1
1 1 0 1 1 0 0 0
1 1 1 0 0 0 0 1
1 1 1 1 0 1 1 0
Table 7.3: Truth table for FSM of gure 7.17.
a
n
= a
n1
b
n1
(c
n1
I +c
n1
I) +a
n1
b
n1
c
n1
I (7.9)
b
n
= a
n1
b
n1
(c
n1
I +c
n1
I) +a
n1
c
n1
(b
n1
I +b
n1
I) (7.10)
c
n
= a
n1
b
n1
(c
n1
+I) +b
n1
I(a
n1
+c
n1
) (7.11)
O = a
n1
b
n1
c
n1
I (7.12)
73
O
L3
L2
L1
L0
a
b
c
I
Figure 7.16: An FSM implementation.
When you power on a circuit, it may come up in any state. Hang states are states which have
no way of changing to a state that you designed for. For example in the gure 7.15 there are three
states that are unaccounted for namely 010, 100 and 101. Since these are three out of eight possible
states you could guess that there is a 37.5 % probability that the FSM will be in one of these states
when it powers up. If it did it might not respond to any input, so the FSM would never change
states. If it did this, then the system is said to be in a hang state. To avoid hang states we simply
add a path from these states to a known state as shown in the gure 7.17. The x indicates a dont
care.
7.6 Domino logic
Domino logic is the most common form of dynamic logic. Dynamic logic is used in micro-
processors for two reasons namely that the size of the circuit is smaller and for a given supply
voltage you can implement logic that would not be feasible in CMOS. Dynamic logic is based on
the transient movement of stored charge so it is particularly layout sensitive. A generic stage is as
shown in the gure 7.18. The dotted box contains the logic to be realized.
The operation is divided into two parts precharge and evaluate. In this gure the pmos tran-
sistor is the pull-up transistor. It is turned on when the clock is low. It charges up the source and
drains of the nmos logic in the dotted box. Since the nmos pull down transistor is off current cannot
ow, but the logic inputs decide what charge is stored on the source and drain capacitors in the
logic box and on the input capacitance of the load.
When the clock goes high the logic is evaluated. The pfet pull up transistor is turned off and
the nfet pull down is turned on. At this time if the logic in the dotted box evaluates to 1, the source
and drain capacitors are discharged. However if the logic in the dotted box evaluates to 0, then
74
000
001
011
0/0
1/0
0/0
0/0
0/0
1/1
111
110
1/0 1/0
1/0
0/0
010 100 101
x/0
x/0
x/0
Figure 7.17: Avoiding hang states.
Clock
Figure 7.18: A Domino logic stage.
75
the charge stored on the source and drain contacts is visible to the output.
Since the charge stored was originally stored based on the pull up pfet being turned on there
will be some charge redistribution when the pull down nfet is on instead. The sizing of the transis-
tors needs to be such that this charge distribution does not result in a change in the output value.
This is only a concern for the case where the logic in the dotted box evaluates to 0. A method of
analyzing dynamic logic is outlined in [72].
Since all capacitances are leaky, you cannot depend on the evaluated logic to hold for much
after the end of the cycle, so it is timing critical that you take the output as soon as possible and do
something with it i.e. this is dynamic logic and it is only valid as long as the data keeps owing.
Often CMOS logic is interspersed with the dynamic logic to actively pull many signals to either a
solid 1 or 0.
76
Chapter 8
Analog circuits
8.1 Current mirror
Consider the circuit in the gure 8.1. The PFET controls the currents I
1
, I
2
, I
3
and I
4
. If the
areas of the BJTs are A
1
, A
2
, A
3
and A
4
respectively, then the ratio of the currents is given by the
equation 8.1.
I
1
A
1
=
I
2
A
2
=
I
3
A
3
=
I
4
A
4
(8.1)
A1 A2 A3 A4
I2 I3 I4 I1
Vc
Figure 8.1: A current mirror.
8.2 Current sources
The ideal current source can drive a xed current no matter how large the resistance it is
driving into. In real life there is no such thing as an ideal current source. In real life current
sources are essentially variable resistors which have the property that the resistance drops as the
voltage across the source drops and rises as the voltage across the source rises. In this way they
can attempt to maintain a constant current.
The driving point impedance is a concept which is only valid over a range of current output
by a current source. It is based on a Thevenin equivalent circuit. Compare the two circuits shown
in the gure 8.2.
If the load resistance R
l
in the gure 8.2 is 1 then either of the two circuits will drive 1 A
of current through the load. But if the load resistance is 20 , then the circuit on the left will only
77
Rl
99
100
Rl
9
10

+
Figure 8.2: Different driving point impedance.
drive 0.345 A through the load, whereas the circuit on the right will drive 0.840 A through the
load.
Now, since your goal was to drive 1 A through the load, then if your load varies between 1
and 20 , then it is clear that the circuit on the right does a better job of meeting the requirement.
So the Thevenin impedance of your current source is called its driving point impedance and you
would like this to be much higher than the maximum value that the load resistance can become.

+
Figure 8.3: An n type current source.
Current sources normally use only a single FET or BJT to control the current ow as shown
in the gures 8.3 and 8.4. If you want to sink current you use nmos or n-p-n as in the gure 8.3
whereas if you want to source current you use pmos or p-n-p as in the gure 8.4.

+
Figure 8.4: A p type current source.
One way to improve the output resistance of an FET current source is to simply use longer
channel FETs, which are less susceptible to the dibl or drain induced barrier lowering effect and
hence have drain characteristics with a smaller slope. This also has the advantage that the varia-
tion in L
diff
is a much smaller fraction of the channel length, so you will get better immunity from
process variation. The ip side is that the longer the channel the more your exposure to back bias
so now you will have to worry about uctuations in substrate potential.
One disadvantage with controlling a current source by a gate voltage or base voltage is that
with the normal variation in process, the same voltage at the gate or base at the slow end of
process would result in a current very different from that obtained at the fast end of process. This
78
is especially true for the BJT because the base diode current is exponentially dependent on the
base voltage.
Figure 8.5: Resistor controls the current.
The way this is avoided is to use a current mirror as shown in the gure 8.5. The current of
the current source is controlled by the resistance. So at the slow process corner the control voltage
is higher and at the fast process corner the control voltage is lower so there is some compensation
here.
But even so, given that the threshold voltage of the typical FET is a signicant portion of the
supply voltage and the gate voltage has to be somewhat higher than the threshold voltage, the
current at the fast process corner is going to be higher than that at the slow process corner.
A length of diffusion implant with a contact at either end is an effective resistor as shown in
the gure 8.6. The resistance is dependent on the effective length L, the width W (not shown) and
the depth D.
L
D
Figure 8.6: A diffusion resistor.
Since the diffusion is implanted in a semiconductor of the opposite type as the implant species,
the depth D is actually potential dependent and hence the usual practice is to characterize the
diffusion resistor as a JFET. As the potential of the resistor increases, the depletion width reduces
the height of the resistor and increases its resistance.
When you need a constant value independent of process corner, it is common to use perhaps
four independent resistances with two of them in parallel in series with the other two in parallel.
By orienting the resistors in perpendicular directions and using specic combinations of W and L
one can construct a resistor which is relatively independent of process corner.
The Widlar current source [73] is a way to generate very small currents in a BJT current source
without the use of large value resistors. Aresistor is inserted in series with the emitter as shown in
the gure 8.7. Since both base and collector current ows through this resistor it drops a voltage
of (1 +)I
b
, so a small resistor is sufcient.
79
This voltage is the difference in the V
be
of the two n-p-n BJTs and so you can get a very large
ratio difference between the currents I
1
and I
2
. After solving equation 8.2 to obtain V
BE1
, you can
substitute it in equation 8.3 for an assumed value of I
2
. Then you can adjust either A
2
or R
2
until
the V
out
is what you require.
I1
I2
Q1
R2
R1
Q2
Vout
Figure 8.7: Widlar current source.
V
BE1
+ R
1
I
s
exp

V
BE1
V
T

1 +
V
BE1
V
A

= V
CC
(8.2)
V
BE2
= V
BE1

kT
q
ln
I
1
A
2
I
2
A
1
(8.3)
A cascoded current source is similar to a simple current source except you stack the mirror
sections as shown in the gure 8.8. The improved performance is due to the way the potential at
point a responds to changes in the potential b.
a
b b
a
Figure 8.8: A cascoded current source.
When the load resistance becomes large, the cascoded current source behaves just like the
simple current source. The improvement is when the load resistance is reduced. In the simple
current source of the gure 8.5, the current of the source will increase when the load resistance
drops.
80
Vds, Vce
Vgs5, Vbe5
Vgs4, Vbe4
Vgs3, Vbe3
Vgs2, Vbe2
Vgs1, Vbe1
Id, Ic
Figure 8.9: Bias point a in a cascoded current source.
This is because for either an FET or a BJT, for the same gate voltage or base voltage, the current
through the FET or the BJT will increase as the drain voltage or collector voltage is increased, by
as much as 50% or more. This is due to the slope in the drain or collector characteristics.
But in the case of the cascoded current source, as the voltage at b rises due to a drop in the
load resistance, the voltage at a rises as indicated by the diamonds in the gure 8.9. So as the
load resistance falls, the bias point of the upper transistor moves from the left-most diamond to
the right-most diamond. So as the V
ds
or V
ce
rises, the V
gs
or V
be
falls so as to keep the drain or
collector current the same.
8.3 Active load
An active load is actually a current mirror used in a differential circuit as shown in the gure
8.10, but it is better than a pair of resistors because it effectively causes a gain in the output swing.
A
A
B
Q1
Q2
Q3 Q4
Figure 8.10: A current mirror as an active load.
Let us look at what happens when A drives low and A drives high. Q
3
starts to turn off, so
then Q
1
starts to turn off, forcing Q
2
to also start to turn off, which means it exhibits a higher
resistance, therefore the voltage at B will drive hard low.
Contrast this with the case when Q
2
were replaced with a resistor, in this case it will still
provide a pull-up current which Q
4
has to ght to pull-down the load attached to B. So as you can
81
see using an active load improves the output swing.
8.4 Level shifting
In analog circuits one sometimes wishes to shift an output voltage either higher or lower by a
xed amount. In the rst case you would use a p type shifter and in the second you would use
a shifter based on nfet or n-p-n as shown in the gure 8.11. The actual amount of the shift will
depend on the bias point of the upper transistor.
A
A Vgs A Vbe
A
Figure 8.11: A level shifter.
8.5 Common emitter/source amplier
You can construct an amplier with a single transistor as shown in the gure 8.12. If you vary
the input voltage by a small amount about the bias point the transistors bias point varies between
the points 1 and 2 as shown in the gure 8.13.
I2
Ii
Ipu
Ipu
I2
Vg
Figure 8.12: Single transistor amplier.
The single stage shown in the gure 8.12 is inverting. When the input voltage rises, the BJT
or FET current rises, the voltage dropped across the load resistor increases and the output voltage
drops. When the voltage is at its lowest the transistor is at point 1 and when the input voltage
reaches its maximum the transistor is at 2.
82
Vds, Vce
Id, Ic
2
1
Figure 8.13: Transistor bias point.
8.6 DC gain
The small signal gain of the amplier of gure 8.12 depends on the bias point and the value
of the resistance and also the slope of the drain or collector characteristics.
The output voltage is given by the equation 8.4 for the BJT stage and the equation 8.5 for the
FET stage. In equation 8.4 I
s
is just the reverse saturation current of the base emitter diode alone
and equation 8.5 uses just the ideal saturation equation for the FET current and does not consider
the drain slope.
V
out
= V
supply
R I
s
e
q (V
be
+V
be
)/(nkT)
(8.4)
V
out
= V
supply
R

ox
T
ox
W
2 L
(V
gs
+ V
gs
V
th
)
2
(8.5)
Now take the derivative of V
out
vs. V
in
and you get the voltage gain. Similarly you can obtain
the current gain. In order for you to have amplication the product of the current gain and the
voltage gain has to be signicantly higher than one.
8.7 Emitter/Source follower input
The common collector/drain which is also called the emitter/source follower the gure 8.14
is used as an input stage due to its high input impedance and low output impedance. The reason
this is important in an input stage is to avoid loading the output of the previous stage.
A
A Vbe
Rl
A
Rl
A Vgs
Figure 8.14: An emitter follower input.
83
The load impedance R
L
is multiplied by (1 + ) in the case of the BJT and by g
m
in the case
of the FET and is visible to the input i.e. the input voltage dropped across the base-emitter or the
gate-source is reduced by (1 +)I
b
R
L
or g
m
V
gs
R
L
.
So the input resistance is very high so long as you have a high or g
m
. In the case of the FET,
the back bias does have an effect, but if the gate length is very short, the back bias is much less
important.
8.8 Bootstrapping
A Vbe
Rl
A
R1
R2
Figure 8.15: A bootstrap capacitor.
If you have a unity voltage gain amplier, and that amplier has an input biasing arrangement,
then you can increase the effective input impedance of the amplier by the use of a bootstrap
capacitor. This is just a capacitor large enough that it has a low impedance at the lowest frequency
to be amplied, connected between the output and the input as shown in the gure 8.15.
The bias voltage of the base is R
1
V
supply
/(R
1
+ R
2
). If the signal driving the input wants to
raise the base voltage by V , the current through R
1
has to reduce by V/R
1
while the current
through R
2
has to increase by V/R
2
which means that the feedback capacitor has to supply the
current
I = V

1
R
2

1
R
1

(8.6)
Keep in mind that this only works if the input signal is ac. The capacitor voltage does not
change because the voltage gain is unity so there is no damping effect due to the feedback.
It is called a bootstrap circuit because the input just seems to pull itself up by its bootstraps. The
best way to think of a bootstrap circuit is to compare it to the counterweight used in an elevator
shaft.
8.9 Millers theorem
Millers theoremis particularly useful in analyzing amplier circuits which have an impedance
such as a capacitance connecting the output to the input, but it applies in general to any circuit. In
the gure 8.16 you need to be able to express the voltage at 2 as a function of the voltage at 1 so
that V
2
= G V
1
.
84
Z
Z Z"
1 2
1 2
Figure 8.16: The Miller effect.
If that is true then you can analyze the circuit by the equivalent circuit on the right, where:
Z

=
Z
1 G
(8.7)
Z =
Z G
G1
(8.8)
8.10 Gain bandwidth product
Any integrated circuit contains parasitic capacitances. In addition, the internal capacitances of
the devices change with the bias point. In order for an analog circuit to function, these capacitances
need to be charged and discharged. The current drive of the FETs or BJTs used in an analog circuit
are determined by the bias point.
So once you set the bias point in an amplier circuit, for a small signal applied to the input,
both the capacitances and the drive currents are known. At this time the frequency content at the
input is amplied differently. For the low frequency content, there is more time to charge and
discharge the capacitances than at higher frequencies.
For this reason a quantity is dened called the gain-bandwidth product at a given bias point.
By decomposing the input using the Fourier transform and calculating the amplication for each
component and then merging them back together you can obtain the output.
So if you wish to obtain as little frequency distortion as possible you would like to design the
bias point with a high value for the gain-bandwidth product.
8.11 Voltage reference
The gure 8.17 is the Widlar band-gap reference [74]. In a previous paper [73], the voltage
dropped across R
3
is given by equation 8.9 if Q
1
and Q
2
are identical. The equation relating V
be
of
a BJT to the collector current was given in [75] as equation 8.10 where V
g0
is the band gap energy
extrapolated to 0K and V
BE0
is the V
BE
at T
0
.
V
BE
=
kT
q
log
e
I
c1
I
c2
(8.9)
V
BE
= V
g0
(1
T
T
0
) + V
BE0
(
T
T
0
) +
nkT
q
log
e
(
T
0
T
) +
kT
q
log
e
I
c
I
c0
(8.10)
85
I
Vref
Q3
Q1
R1 R2
R3
Q2
v1
v2
Figure 8.17: Widlar band-gap reference.
In [74] the requirement to make the output voltage V
ref
stay constant over a temperature range
is given by equation 8.11. The resulting variation was reported to be as little as 0.3% over a range
of temperature from -55
o
C to 125
o
C. There are also all CMOS voltage references such as [76].
V
g0
= V
BE0
+
kT
0
q
log
e
I
c1
I
c2
(8.11)
8.12 Differential circuits
Differential circuits are different from normal circuits in that all signals travel in pairs. For
every signal there is a corresponding signal that does the opposite. This other signal is often
denoted by a bar over the signal name to denote that it is the opposite, just as in a logic circuit.
The way these signals are used are as shown in the gure 8.18.
In the gure 8.18 suppose R = 20 k, I = 20 A, n = 1.5 and I
s
= 50 pA. If A and A are at
the same voltage, then 10 A ows through the left side of the circuit and 10 A ows through the
right side of the circuit, and both B and B are at 200 mV lower than the supply voltage.
Now, if the difference voltage AA = 200 mV , then the ratio of the currents is given by
I
1
I
2
=
I
s
e
q V
be1
/nkT
I
s
e
q (V
be1
200mV )/nkT
(8.12)
I
1
+I
2
= 20 A (8.13)
I
1
= 19.88 A , I
2
= 0.12 A (8.14)
V
be1
= 0.5 V (8.15)
Differential circuits are used anywhere that you require a high rejection of supply noise and
other noise in the circuit. For example there are always regions of the circuit where the power
86
A
A
B B
I
R R
I1 I2
P
Q1 Q2
Figure 8.18: A sample differential circuit.
consumption density is higher than normal. In such cases if the supply lines are not wide enough,
there will be a localized drop in supply voltage.
So if a signal is output by a circuit in this region, it will be lower than it should be. If this were
a single ended signal and it was lower than it should be it would be incorrect at the receiving end.
But if the signal was differential in nature then both the signal and its inverse would be lower
than they should be, and since only the difference voltage between the two is important, there is
no error at the receiving end.
The common mode rejection ratio is dened as in equation 8.16 where A
d
is the gain of the
difference signal and A
c
is the gain of the common mode signal.
CMRR =

A
d
A
c

(8.16)
8.13 Transistor matching
D1 D2
S S
D1 D2
D1
S S
D2
S S
D2 D1
Figure 8.19: Common centroid layout.
In analog design there is often a need to have a good matching between two transistors of the
same size. One case for matching transistors is a differential circuit but there are many other cases
87
where there is such a need. The simulation of analog circuits will not show the effect of transistor
mismatch unless you do something specic to model that effect [77].
The most often used is the "common centroid" approach as shown in the gure 8.19. For the
case when this structure is used as a common source pair the matching can be improved by using
the structure on the right. Some of the studies of matching in circuit design such as [78], [79] help
decide what W/L ratios would give the best results and how to model the mismatch in a circuit
simulation.
In the two structures in the gure 8.19 there are two main effects that help matching. The rst
is close physical proximity of the two FETs. If the FETs are placed side by side, they are more likely
to receive the same level of source, drain and threshold adjust implants, they will be exposed to
the same level of etching and oxide growth and go through the same thermal cycles.
The other effect is due to angular mismatch. Masks are aligned optically, so there will be some
angular mismatch between one masking step and the next. This is the reason for splitting the FETs
into two parts and placing them diagonally to each other. Suppose the upper FETs were a little
narrower and the lower FETs were a little wider it would be compensated for. Similarly for any
mismatch effect that has an angular component to it.
8.14 Bode plots
Bode plots come in pairs. One plot is the log of the magnitude versus the log of the frequency
and the other is the phase angle versus the the log of the frequency. The unit for the magnitude
plot is the decibel or dB which is 20 log
10
[G(j)[. The magnitude plot can be easily drawn once
the transfer function is factorized into poles and zeros. For example the transfer function in the
equation 8.17 has zeros at b and c and poles at d, e and f.
G(j) =
a(b +jw)(c +jw)
(d +jw)(e +jw)(f +jw)
(8.17)
b
c
d
e f
Figure 8.20: Bode plot for equation 8.17 if b < c < d < e < f.
First the poles and zeros are ordered in increasing frequency. The plot is started at the mag-
nitude of G(j) at zero frequency, but if b, c, d, e or f is zero, you can evaluate it at a higher
frequency and later on extrapolate back to zero. Then the frequency is incremented. Each time a
88
zero is reached the magnitude is incremented by 20 dB/decade, and each time a pole is reached
the magnitude is decremented by 20 dB/decade. Having plotted the straight line graph as shown
in the gure 8.20, the actual frequency response is obtained by correcting the response around the
poles and zeros because the straight lines just connect the asymptotes.
f e d c b
0
Figure 8.21: Bode phase plot for equation 8.17 if b < c < d < e < f.
For the phase plot, each term in the equation 8.17 will contribute a phase angle which sum up
in the numerator and subtract in the denominator i.e. the b+jw termgives an angle of tan
1
(w/b)
and the d +jw term gives an angle of tan
1
(w/d) as shown in the gure 8.21
8.15 Rouths stability criteria
Ampliers are often used in feedback applications as shown in the gure 8.22. Using Laplace
transforms the gain of the amplier would be G(s) and the gain of the feedback loop is H(s). The
transfer function of the block is given by the equation 8.18.
G(s)
H(s)
I(s)
+

O(s)
Figure 8.22: A feedback loop.
O(s)
I(s)
=
G(s)
1 +G(s)H(s)
(8.18)
89
If a circuit is unstable it means that it will oscillate. All circuits have noise in them both due
to thermal noise, shot noise and 1/f noise but also due to the noise on the power lines due to the
switching of the transistors themselves. Some of this noise is bound to fall in the frequency region
where the circuit has a 180
o
phase relationship, and that noise will cause an oscillation at the input
which is fed back with the opposite phase with a larger amplitude and this will build up until the
circuit is non functional.
The Routh stability criteria is a way of analyzing the stability of a feedback loop without having
to solve for the poles and zeros. In order to use the Rouths criteria the transfer function needs to
be written in the polynomial form of the equation 8.19 where a
0
= 0.
O(s)
I(s)
=
b
0
+b
1
s +b
2
s
2
+. . .
a
0
+a
1
s +a
2
s
2
+. . .
(8.19)
Rouths stability criteria requires that:
1. the coefcients a
n
= 0 i.e. if a
5
= 0 then it is required that a
4
= 0.
2. all the a
n
must be of the same sign.
3. If the rst two conditions are met, then a table is created as shown in the table 8.1 where the
bs, cs etc are given by the equations 8.20, 8.21, 8.22 and 8.23.
s
n
a
n
a
n2
a
n4
. . .
s
n1
a
n1
a
n3
a
n5
. . .
s
n2
b
n1
b
n3
b
n5
. . .
s
n3
c
n1
c
n3
c
n5
. . .
Table 8.1: Table for Routh stability analysis.
b
n1
=
a
n1
a
n2
a
n
a
n3
a
n1
(8.20)
b
n3
=
a
n1
a
n4
a
n
a
n5
a
n1
(8.21)
c
n1
=
b
n1
a
n3
a
n1
b
n3
b
n1
(8.22)
c
n3
=
b
n1
a
n5
a
n1
b
n5
b
n1
(8.23)
In the table 8.1 the requirement is that there be no sign changes in the rst column of coef-
cients. If all the entries in this column are non zero and have the same sign, then the circuit is
stable. If there are sign reversals the number of sign reversals equals the number of unstable
poles.
8.16 Nyquist path
The closed loop transfer function is given by the equation 8.18. According to the Nyquist
stability criterion if the roots of the equation 8.24 lie in the left half of the s plane the system is
stable.
1 +G(s)H(s) = 0 (8.24)
90
To get the Nyquist path you plot the value of G(s)H(s) in the s plane with the real value on
the x axis and the imaginary part on the y axis. The point 1 +j0 would be the origin if you were
plotting 1 + G(s)H(s). The graph is symmetric about the x axis. The Nyquist stability criteria is
as follows
1. The Nyquist path cannot pass through poles or zeros of G(s)H(s).
2. If there are no poles or zeros on the j axis.
The requirement is that:
Z = N + P (8.25)
where
Z = number of zeros of 1 +G(s)H(s) in the right half of the s plane
N = number of times the locus circles the 1 +j0 point in the same direction as the locus
P = number of poles of G(s)H(s) in the right half of the s plane
3. If there are poles or zeros on the j axis.
In this case to meet the rst requirement the contour is modied so that the locus does not
pass through these points but goes around it at a innitesimally small distance 0.
For example the plot shown in the gure 8.23 is stable if the contour encircled one pole and
no zeros in the right half of the s plane, because it encircles the 1+j0 point once in the clockwise
direction.
Imaginary
Real
1
Figure 8.23: A sample Nyquist path.
8.17 Sample and Hold circuit
The gure 8.24 shows the simplest sample and hold circuit. It contains only three components.
An input switch to gate the input. A capacitor to ground to hold the measured voltage. And an
operational amplier congured as a voltage follower to isolate the capacitor from the loading
due to the input of the circuit evaluating the voltage stored in the capacitor.
91
Vout
Vin
+

Figure 8.24: A generic sample and hold circuit.


8.18 Analog to digital conversion
In order to convert an analog signal into a digital signal you have to rst decide two things
namely how often to sample and how many bits of accuracy each sample has to be converted to.
For increased accuracy both need to be high together.
If you have a high sampling rate you can extract a higher maximumfrequency fromthe digital
data. In addition for a given time duration if you have a higher sampling rate, you will obtain a
larger total number of samples which means that the frequencies obtained after a Fourier trans-
form will be more closely spaced.
And nally, if you have a large number of closely spaced frequencies all the way upto a very
high frequency, then you need a large number of bits of accuracy for each sample, because other-
wise you will see a smearing between frequencies, i.e. a peak at a frequency will be spread onto
the frequencies surrounding it.

+
Vin
DAC Counter
Figure 8.25: A simple A/D converter.
The serial A/D is counter based and is the slowest implementation. Its speed is on the order
of 2
n
. It just uses a DAC whose input comes from a counter and the output is compared to the
analog input voltage by a comparator and the count when the comparator changes sign is the
digital value as shown in the gure 8.25.
The successive approximation is faster and in this approach each bit is sequentially tested so
its speed is on the order on n as shown in the gure 8.26. It is based on the fact that in a binary
number each bit has a value equal to the sum of all the less signicant bits + 1. So starting at the
MSB and decrementing, each bit is turned on. If the DAC output is higher than the analog input,
then that bit should be 0 whereas if it not higher then that bit should be 1. Once a bit is set you
leave it at that state and when the LSB has been obtained in this way, the nal value at the DAC
input is the digital value you need.
The parallel or ash A/D is the fastest but it requires a prohibitively large area. It is based on a
Kelvin voltage divider and it requires 2
n
resistors and 2
n
1 comparators and so it is only feasible
for n = 2, 3 or 4. The block diagramis shown in the gure 8.27. The highest comparator that yields
92

+
Vin
DAC Logic
Figure 8.26: A successive approximation A/D converter.

+
Vin
Digital out
Figure 8.27: A parallel or ash ADC.
93
a high indicates the level, so if the highest level reached is the 5th comparator then the bits 101 are
output so the speed of the ADC is just the speed of the comparator + the combinational logic.
f
N
@ 2
k
f
N
f
N
+

dt
x[n]
2
k
O
1bit
D/A
Decimator
k
Figure 8.28: A rst order sigma-delta ADC.
The most often used is the Sigma-Delta analog to digital converter [80], [81], [82]. There are
two types one using switched capacitor circuits and the other which is called continuous time
[83], [84]. The Sigma-Delta A/D converter is useful to digitize a low frequency signal at a high
resolution. A simple rst order Sigma-Delta A/D converter is shown in the gure 8.28.
The cheapest implementation of a Sigma-Delta converter uses a 1-bit DAC which is a circuit
which has one input bit and outputs either a positive reference level or a negative reference level
of equal magnitude. So the quantizer is also 1-bit, so it essentially a comparator which outputs
either a positive pulse or zero and it is clocked at . If the bits of resolution is k then the analog
input signal has to be sampled at = 2
k
f
N
where f
N
is the Nyquist frequency of the analog signal.
From the balance condition when the integrator is at steady state, the time integral of the
pulses output by the 1-bit DAC must be equal to the integral of the sample and holds x[n] during
the same cycle of 1/f
N
. So this means that the output of the quantizer is a digitized equivalent of
the analog input. But in order to obtain the k bits of resolution the bit stream at 2
k
f
N
has to be
converted into a k bit wide stream at f
N
and this is done by the decimator. That k bit wide stream
is the digital output.
The reason that Sigma-Delta A/Dconverters are popular is that they dont require a k bit DAC
which could occupy a large area but rather just use a 1-bit DAC operating at a high frequency. The
other advantage with this approach is that linearity is not a problem and there is no matching
requirement as in the case of the k bit DAC.
8.19 Digital to analog conversion
The simplest DAC is based on the Kelvin voltage divider as shown in the gure 8.29. It is
similar to the parallel ADC in reverse. All that the bits do is to select which input is to be output.
The linearity of the DAC is just the accuracy of the resistances. You can even make a special
nonlinear DAC by varying the resistances. The disadvantage is that you need 2
n
resistors.
The most common DAC is a binary DAC as shown in the gure 8.30. In the R-2R resistive
ladder everything to the right of the current insertion point is a resistance of R. The most signicant
bit (MSB) controls the left most switch and the LSB controls the right most switch. In fact the
current is not actually turned on and off but rather switched into the R-2R network or switched
into a dummy load by the use of a differential gate. This is done in order to keep the current drives
stable and to avoid transient uctuations.
94
Digital in
Analog out
Figure 8.29: The simplest DAC.
2R 2R R
R R R
I I I I
Vout
b0 b1 b2 b3
Figure 8.30: A binary DAC.
95
The equivalent circuit when each of the bits are on is shown in the gure 8.31. An exercise for
the reader is to obtain the equivalent circuits when more than one bit is turned on and to show
that the voltages add linearly. There are many implementations which use matching FETs instead
of resistors but the issue is that the source and drain voltages will be different for the different
FETs in the ladder and so the on state resistance they exhibit will be different depending on what
bits are on and what bits are off.
2R
I
R
I
R R R
I
R
2R I/4
IR 2IR IR/2
Figure 8.31: Equivalent circuits.
The Sigma-Delta modulator is also used to make DACs as shown in the gure 8.32. In this
case the input is a k bit wide digital input at a clock speed of f
N
and the rst logic block uses these
k bits to output a bit stream at 2
k
f
N
.
k
f
N
f
N
2
k
1bit
D/A
dt
Figure 8.32: A rst order sigma-delta ADC.
This bit stream is input to a 1-bit DAC which in turn outputs pulses of a reference voltage.
When these pulses are passed into the integrator, they are integrated and the output is the analog
equivalent of the digital stream and has the value of p V
dd
/2
k
where p is the number of pulses
which is the value of the digital input.
As in the case of the Sigma-Delta A/D converter the Sigma-Delta DAC became popular be-
cause it does not require all those resistors and it does not require the matching that a conventional
DAC requires and that linearity is not a problem.
8.20 Low power circuits
The total power consumed by a CMOS circuit [85] is given by the equation 8.26 where p
t
is
the probability of a transition, C
L
is the average load capacitance, I
sc
is the short circuit current
which ows when both the nfet and the pfet are turned on at the same time and dt is the duration
for which this happens, and I
leakage
is the leakage current that ows when neither transistor is
turned on.
P
total
= f
clk
(p
t
(C
L
V
2
dd
+ I
sc
V
dd
dt)) + (1 p
t
)I
leakage
V
dd
(8.26)
96
There are many techniques to lower this power [85].
Lower the clock frequency in any portion of the chip where the speed is not required.
Match the pull-up time to the pull-down time of the gates and this helps reduce the time dt
and hence the I
sc
term.
Lower the V
dd
by lowering the V
th
of the FETs. Although this will increase the I
leakage
term
the drop in V
2
dd
will more than offset it upto a point.
Use dynamic logic instead of static CMOS logic wherever possible. There will be fewer
transistors and I
sc
term goes away.
Reduce spurious transitions where a transition occurs within a clock cycle only to be re-
versed again within the same clock cycle due to different signals arriving at different times
i.e. race conditions.
Dene blocks of circuitry which can be powered down when not in use.
If there is a choice between different implementations for subcircuits then pick the one which
uses less power. For example some structures are more parallel and are designed for higher
speed but may not be suitable for a low power application.
Bipolar transistors have been used to make micropower operational ampliers such as [86].
In the subthreshold region the MOSFET also behaves like a bipolar transistor [87] and has a high
transconductance and the current is an exponential function of the V
gs
and this fact is used in some
low power circuits.
8.21 Laser trimming and other techniques
Many manufacturers of precision analog chips have used laser trimmable components to ad-
just chips during the testing phase. Laser trimming is based on laser ablation where the energy
supplied by the laser causes the material to heat up and vaporize. Typically resistors are the
trimmable components.
Laser ablation works better on materials with a high thermal resistance because it works better
when all the energy that is supplied is used to heat a small local area. The trimmed resistor can
be modeled by a matrix of resistances [88] where some of the resistors are the ones affected by the
laser trimming process. It is also possible to make structures that can be trimmed electrically [89]
so that a laser is not required.
before after
Figure 8.33: A fusable connection.
Blowing fuse like connections is another way that chips are modied after fabrication. A fuse
is just a wide interconnect structure with a narrow neck region as shown in the gure 8.33. There
may be a small array of these fuses and a logic circuit that can select a fuse from the array based
on a control word. There may be a dedicated pin on the chip through which the instructions to the
97
logic circuit are clocked in serially. There is also a high current driver which supplies the current
that is actually required to blow the fuse.
So based on the instructions clocked into the serial pin, the logic selects which fuse needs to
be blown and turns on the driver to pass current through that fuse. When the current is passed,
the narrow neck region overheats and melts and forms beads on either side. At this point the fuse
if blown. During the normal operation of the chip this logic circuit is not active or is disabled so
that it is isolated from the normal functionality of the chip and is only used during testing and
calibration. This type of fuse is especially popular with redundant circuits where where there is a
backup component on chip to take the place of a design component that is not functional, so by
blowing the suitable fuses, the backup can be included in place of the original component.
98
Chapter 9
Microprocessors
9.1 Binary number system
9.1.1 Integers
2
1
2
0
= 2
= 1
2
31
sign bit
= 2,147,483,648
Figure 9.1: Integers stored on a computer.
Almost all computer processors nowadays use 32 bit integers and many provide a 64 bit integer
for operations that require it. The way an integer is stored in a computer is shown in the gure
9.1. If the integer is a signed integer then one of the bits is used as the sign bit. As a result the
largest positive number is 2
30
+ 2
29
+ 2
28
+. . . + 2
2
+ 2
1
+ 2
0
= 2, 147, 483, 647.
If the number is 64 bits long the largest signed number is 2
63
1 = 9, 223, 372, 036, 854, 775, 807.
Integers are used in many applications for example most graphics is done in integers. Similarly
banks use integers to count the money, because the numbers need to be accounted for down to a
cent. The sign bit is 0 for positive numbers and 1 for negative numbers.
Negative numbers are represented as they are used in a formcalled twos complement. To get the
twos complement you rst invert each bit and then add 1. So for example the twos complement
of 0010110101000101 is 1101001010111011.
9.1.2 Floating point numbers
Floating point numbers represented by a 32 bit oat can go as high as 10
38
and as low as 10
38
,
those represented by a 64 bit oat can go as high as 10
308
and as low as 10
308
. The oat is divided
into two parts the mantissa and the exponent. So the mantissa for the number 1.23495632 10
36
is 1.23495632 and the exponent is 36. The mantissa of a 32 bit oat has 8 signicant bits i.e. you
have 8 digits after the decimal point. The mantissa of a 64 bit oat has 16 signicant bits.
99
This reveals a limitation of oating point numbers for example, 1.23495632 10
10
1 =
1.23495632 10
10
, since you only have 8 signicant bits. The reason you want to use them of
course is that the largest 64 bit oat can hold a a factor of 10
289
larger than a 64 bit integer and on
the other end integers by denition cannot hold fractions.
9.2 p block diagram
(DRAM)
Extended memory
Microprocessor
cache (SRAM)
Hard Drive
DMA
Microprocessor
stack (SRAM)
Main memory
ALU
Interpreter
Figure 9.2: The components of a computer.
A microprocessors primary function is to run programs. The operating system is a program.
All programs are processor dependent. They are a sequence of instructions interpreted by that
processor. All the logic is embedded in those instructions.
When a program is run all the instructions in the program are part of the stack. In addition the
program may store runtime data in the heap. The stack and the heap together are the operating
space of the processor i.e. all the instructions and data are stored in this space.
The typical program may be 10 megabytes long. The additional heap requirements may be an
additional 10 megabytes for a total of 20 megabytes. The processor itself does not have this much
memory in it. The stack space is a few kilobytes, which means that the entire 20 megabytes needs
to be rotated through the stack.
The way that the program is run is that it is loaded from the hard drive onto the RAM external
to the processor. Between the RAM and the stack is a memory space called the cache of about a
megabyte located in the microprocessor. As the processor starts to execute the program the cache
memory pre-fetches the next set of instructions from the RAM and thereby helps speed up the
ow.
As the instructions and data ow through the cache, some instructions and some data are
requested more often by the processor and so the cache keeps those most often requested rather
than ushing them after use. The reason for the cache is the speed of access. The stack is closest
to the ALU right on the processor core and runs at processor speed.
The cache may be split into a portion on the processor core and another which may be a chip
fabricated separately but just packed in the same package as the chip and connected via bond
100
wires. The cache is similar to the stack and is static RAM perhaps built using bipolar technology
for speed.
The slowest is the off-chip RAM. It only operates at less than a giga-hetrz. On the other hand
it is dynamic RAM built using CMOS technology and therefore even a large amount of such RAM
is cost efcient.
9.3 Arithmetic logic unit
The ALU performs addition, subtraction, multiplication and division.
9.3.1 Addition and subtraction
The one bit full-adder is the Table 9.1. If your integers are 32 bit signed integers then you will
need 32 one bit full-adders. The CO of the lower order bits are connected to the CI of the next
higher order bit. The CI of the lowest order bit is 0. The CO of the highest order bit is thrown
away.
So in this way there is no difference between positive and negative numbers and to assure
yourself of this try adding a 32 bit positive number with its twos complement and you will
get all 0s. So, when subtraction is done the number being subtracted is rst converted to twos
complement and then added to the rst number to get the result.
CI A B Output CO
0 0 0 0 0
0 1 0 1 0
0 0 1 1 0
0 1 1 0 1
1 0 0 1 0
1 1 0 0 1
1 0 1 0 1
1 1 1 1 1
Table 9.1: Truth table for a 1 bit adder with carry-in.
So the logic for the full-adder is:
Output = (CI A B) + (CI A B) + (CI A B) + (CI A B) (9.1)
CO = (A B) + (CI B) + (CI A) (9.2)
Carry lookahead
One disadvantage with the addition of 32 bit integers by the full-adder of table 9.1 is that you
need to know what the CI is going to be. So if you have a 32 bit integer, then in order to evaluate
the highest order bit, you would have to evaluate the lower 31 bits to know the CI for the highest
order bit. If you assume that it takes a clock cycle for a full adder to evaluate, this would mean
that a 32 bit integer addition would take 32 clock cycles to complete.
Hence you need the carry lookahead logic. It is a combinational logic which predicts the CI of
a higher order bit without having to evaluate the output of the full adder at each of the lower order
101
bits. In order to obtain the logic for the carry lookahead, you can simply substitute the equation
9.2 in place of the CI of the next higher bit and then simplify the expression. As you can see the
expression is going to use A B and A+B at each bit.
CO
n
= A
n
B
n
+CO
n1
(A
n
+B
n
) (9.3)
CO
n
= A
n
B
n
+ [A
n1
B
n1
+CO
n2
(A
n1
+B
n1
)](A
n
+B
n
) (9.4)
Since you are using combinational logic to predict the CI bits which the full adder would have
calculated anyway, you are adding a substantial number of gates to improve the speed. So you
have to make a tradeoff. Designers decide up front how many levels deep they can go with the
equation 9.4. Then they repeat the unit to make it sequential that is you go back perhaps 4 bits
and at that level you use the CI obtained from the previous 4 bits and so on, so you are alternating
parallel and sequential to obtain as much speed as you can for the area and power consumption.
9.3.2 Multiplication
In [90], [91] the method used to do multiplication is to create an array of modied adder cells
as shown in the gure 9.3. The lower order bits on the left are input at the top whereas the lower
order bits on the right are input at the bottom.
9 8 7 6 5
b2
2 3 4
1
output bits
b1
b0 a2
a1
a0
Figure 9.3: The Guild multiplication array [91].
Each bit of one number interacts with each bit of the other number so there are 9 cells for the
example in the gure 9.3, and each bit is input to 3 cells. The output bits are output at the bottom.
Temporary quantities z and u travel from right to left and from top to bottom respectively. Within
each cell for the implementation of [91], the logic used is given by equations 9.5 and 9.6.
u
n
= u
n1
ab z
n1
(9.5)
z
n
= z
n1
ab + abu
n1
+ z
n1
u
n1
(9.6)
For a sample multiplication of 7 6, the tables 9.2, 9.3 show the value of u and z at the end
of each cycle. The output is z
9
u
9
u
8
u
7
u
6
u
5
= 101010. The references [92], [93] show methods
of pipelining multipliers for positive and negative numbers so that even though it takes many
102
Cycle u
1
u
2
u
3
u
4
u
5
u
6
u
7
u
8
u
9
1 1 1 0 1 0 0 1 1 1
2 1 1 1 1 0 1 1 0 1
3 1 1 1 1 0 1 0 0 0
4 1 1 1 1 0 1 0 1 0
5 1 1 1 1 0 1 0 1 0
Table 9.2: u vales for each cell for 7 6.
Cycle z
1
z
2
z
3
z
4
z
5
z
6
z
7
z
8
z
9
1 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 1 0
3 0 0 0 0 0 0 1 1 1
4 0 0 0 0 0 0 1 1 1
5 0 0 0 0 0 0 1 1 1
Table 9.3: z vales for each cell for 7 6.
clock cycles to compute a multiplication, a new set of operands can be input in each cycle and a
new result is output at every cycle. The main method is to use latches to delay different sections
of the ow so that you dont have to keep the operands a and b constant for the duration of the
multiplication process. The general concepts to make such arrays are described in [94].
9.3.3 Division
Division is done by using multiplication and addition. The most popular method of division
is the SRT algorithm[95], [96] named after Sweeney, Robertson and Tocher who each implemented
the algorithm separately. The algorithm is dened by the following equation.
A = Q B +R
The algorithmis used to divide Aby B, so you iteratively guess the value of Quntil the remain-
der R is less than B. At that point Q is the value of A divided by B. Of course to get the complete
answer, the remainder R has to be multiplied by powers of 10 and divided by B exactly as we do
when we do division manually on a piece of paper.
9.4 Shift register
The basic storage in the stack of a microprocessor is a shift register . You can make a shift
register using the D ip-ops of gure 7.10 as shown in the gure 9.4. However assume that it
also has a Clr control which can clear the D ip-op and set Q to 0.
A simple serial in parallel (or serial) out shift register is shown in the gure 9.4. If you apply
a Clk signal, whatever is at the Input is clocked into the rst D ip-op, the rst ip-ops Q is
clocked into the second D ip-op and so on. In the absence of the clock the bits are available in
parallel from the outputs O
3
, O
2
, O
1
, O
0
. You can also clear the entire register by the use of the
asynchronous Clr input.
The most popular shift register you can buy in discrete logic is the "universal" shift register
which can shift left, shift right, accept parallel or serial input and output parallel or serial output.
103
D
Clr
Q D
Clr
Q D
Clr
Q D
Clr
Q
O3 O2 O1 O0
Clk
Input
Clr
Figure 9.4: A simple shift register.
D
Clr
Q
Previous Q or Serial D input
Next Q
Parallel D S0
S1
Figure 9.5: A "universal" shift register.
The way it does this is to modify the input to each D ip-op as shown in the gure 9.5. The
signals that decide what operation is performed are S
0
and S
1
.
If S
0
= 1 and S
1
= 0, it performs the right shift which means that whatever is at the serial D
input is clocked in. If S
0
= 0 and S
1
= 1, it performs a left shift so Q
1
moves to Q
0
and Q
2
moves
to Q
1
and so on. If both S
0
and S
1
are 0, then it performs the parallel in operation i.e. the parallel
D inputs are clocked in to the corresponding D ip-op. As before an asynchronous Clr will clear
the entire register.
9.5 Instructions and operations
01010010 00100110 01111000
01010010 00100110 00101100
Add
Subtract
Increment 01010010 01010011
Decrement 01010010 01010001
Figure 9.6: Math instructions.
When a program is compiled a executable is created which contains the operations to be per-
formed in a language understood by the hardware called the instruction set. The instructions used
by a microprocessor contain opcodes, register numbers or addresses and sometimes the data to
104
use. So an instruction may be as much as 8 or 10 bytes long or even more. If they use only one
operand they are unary operators whereas if they use two operands they are binary operators.
01110110
01110110
01010010 00100110 00000010 And
01010010 00100110 Or
XOR 01010010 00100110
Invert
Left shift
Right shift
01010010 10101101
01010010 10100100
01010010 00101001
Figure 9.7: Binary instructions.
There are many basic instructions that are usually available in any microprocessor. Some like
move, push, pop are used to load the different registers. Input and output are used to talk to
devices and ports. The basic math instructions are add, subtract, increment, decrement, compare
as shown in the gure 9.6. Then there are the basic binary operations of and, or, invert, exclusive
or, left shift and right shift as shown in the gure 9.7. Then there are the two main programming
instructions the different jumps and the different loops.
9.6 CISC and RISC
CISC stands for Complex instruction set computing and RISC stands for Reduced instruction
set computing. The difference is shown in table 9.4. Most personal computers are CISC whereas
workstations often used RISC chips.
CISC RISC
Instructions are maybe 16 bytes long Instructions are maybe 4 bytes long
Maybe 200 instructions Maybe 50 instructions
Few registers Many registers
Large interpreter Small interpreter
Few hard-coded instructions Many hard-coded instructions
Table 9.4: CISC - RISC comparison.
The reason that these differences are important is that as a result of this, RISC processors are
much smaller than CISC processors and in a smaller chip the routes are smaller and the chip is
usually capable of running faster. Smaller chips tend to yield better meaning that a larger fraction
of the chips manufactured function correctly. One reason is that the defect density is a xed num-
ber per unit area so bigger chips are more likely to have defects. Often defects are fatal in that the
whole chip simply wont function correctly if even a single transistor malfunctions.
A large fraction (as much as 30%) of the processor is the interpreter that takes an instruction
and decides what paths are turned on, what data is transferred from the stack to what internal
buffers and what paths are turned on in the ALU and what computation is done. By using a RISC
processor you may be able to reduce the chip area by 15%.
105
Register
Register
Register
Adder
Multiplier
XOR
Register
Register Register
Register
Register
Register
Figure 9.8: The hard wired RISC style.
Register
Register
Register
Adder
Multiplier Register
XOR
Register Register
Figure 9.9: The comparable CISC style.
Another difference is in the hard coding of certain instructions. Basically in a RISC processor
the implementation of certain instructions is done in hardware as shown in the gures 9.8 and
9.9. In the gure 9.8 you can see that you need to place the operands in specic registers if you
wish to do an add operation and the output register used for the add operation is always the same
one. Similarly with the XOR operation or the multiply operation etc. This is the RISC style and
you will use as many registers as you need. However the CISC style is shown in the gure 9.9
and here any of the registers may be used by the multiplier or the adder or the XOR so at least
from the human point of view this is more obvious.
Assume that after doing an addition you are going to use the output of the addition as one
the operands for the multiply. In the RISC case you would have to do a move operation from the
output register used by the adder to one of the input registers used by the multiplier. Whereas in
the CISC case you can simply use the register that the output of the addition was placed in as one
of the input registers for the multiply operation. So you see that in this case the move operation
was not required. And there are many other such examples. This is one reason why the programs
used by RISC processors have to be longer to do the same thing as a comparable program for the
CISC processor.
In reality RISC processor makers have increased the number of instruction so much that they
start to look like a CISC processor and in the meantime CISC processor makers have started to use
RISC concepts to keep the number of instruction to a minimum.
9.7 The critical path
Critical path design is a concept used by logic designers to dene chip requirements and high-
light the most timing critical components so that they can concentrate on them. In the gure 9.10
the critical path is the path going from Input through Logic3, Logic2, Logic4 and nand gate to the
Output. This path is (2+2+6) = 10ns. All the other paths fromthe input to the output are shorter.
So basically the longest path is the critical path because for example if you design Logic5 to be a
lot slower so it takes 8ns instead of 4ns it still would not affect this circuit because it still would not
106
be in the critical path because the path from the input through Logic1, Logic5 and then to output
for a total time of (1 + 8) = 9ns which is less than the critical path which is 10ns long.
Logic1
Input
Logic3
Logic2
Logic5
Logic4
1ns 4ns
2ns
2ns
6ns
Output
Figure 9.10: The critical path analysis.
However you can have more than one critical path if you have two paths of equal length which
is the maximum length. In doing critical path analysis you have to use the worst case time between
the input and output along a given path and not the average time. So if a logic block usually takes
11ns but occasionally requires 12ns then the number you use is 12ns for that block from the given
input to the given output because of course even within a logic block you may have different times
as shown in the gure 9.11. Here the D Q

is the longest at 1.1ns.


D > Q = 1ns
D > Q = 1.1ns
C > Q = 600ps
C > Q = 700ps
C
D Q
Q
D
Figure 9.11: Different delays for different transitions.
9.8 Pipelining
Pipelining is used in almost all timing critical logic circuits nowadays. The basic purpose of
pipelining is to increase the usage of critical components. In the gure 9.12 you see a critical but
slow logic block. After you apply the inputs it takes 3 clock cycles to compute the output. Until it
nishes computing the output you cannot apply the next set of inputs.
Slow logic block
Out
In
C
3 clock cyles
Figure 9.12: A critical but slow logic block.
So the way you use it is shown in the gure 9.13. Here I#1 is when the rst set of inputs are
applied, I#2 is when the second set of inputs are applied and so on. So here you can apply the
inputs every third clock cycle. Assume it is in the critical path. Because it is slow you want to
speed it up. Perhaps you would like it to run thrice as fast.
You can achieve this as shown in the gure 9.14. In order to speed it up the rst step is to break
the logic block into 3 pieces each of which take about a clock cycle to execute. After breaking it
107
I#1 I#2 I#3
Figure 9.13: The usage of the version in gure 9.12.
up you can make adjustments so that each block takes less than a clock cycle to execute. Now you
place latches (L in the gure) after each of these 3 piece. For our discussion let us assume that the
latches are so fast that we dont have to count how much time they take.
L L L
Out
C
In
2 1 3
Figure 9.14: The block of gure 9.12 after pipelining.
The way this new circuit is used in shown in the gure 9.15. Because each piece of the
pipelined circuit only takes one clock cycle to execute (including the time taken by the latch) the
output of that piece is ready by the end of each clock, and that piece is ready to accept another set
of inputs. So now you can apply the inputs every clock cycle instead of every third clock cycle as
for the original version of the circuit. This method of breaking the circuit up and using latches to
hold the intermediate results is called pipelining.
I#1 I#2 I#3 I#4 I#5 I#6 I#7
Figure 9.15: The usage of the version in gure 9.14.
9.9 Intentional clock skewing
In the example we talked about for pipelining what if you cannot easily break the circuit into
pieces that are one clock cycle long ? Since the latches are used to hold intermediate values you
need as many latches as there are intermediate values. So you want to break the circuit up in the
exactly the right places so that you can latch something that is clearly an intermediate result and
also to minimize the number of intermediate results and hence latches used. But this may not
be easy and let us say you get 3 pieces of 1 clock cycle, 1.2 clock cycles and 0.8 of a clock cycle
respectively.
108
L L L
delay1
Out
C
In
2 1 3
C2
Figure 9.16: A pipelined circuit with a skewed clock.
Now obviously the second piece is much more than a clock cycle. And of course you dont
want to use two clock cycles for that piece because then it starts to get messy. Is all lost then ? No,
there is another way out. The answer is that many designers intentionally skew the clocks used in
each section to adjust for such a problem. This is shown in the gure 9.16.
In this gure the delay1 maybe just a couple of inverters in series. Whatever is used is designed
to delay the clock by a fth of a clock cycle. So there are two clocks in this circuit C and C2 and
the operation of this circuit is shown in the gure 9.17. The input of the second piece of the gure
9.16 is clocked in by the clock C, however its output is clocked in to the third piece by the delayed
clock C2.
C
C2
1.2 clock cycles
Figure 9.17: The second piece gets 1.2 clock cycles.
Since the duration from the rise of the clock C to the rise of the delayed clock C2 is 1.2 clock
cycles, the second piece actually gets 1.2 clock cycles to perform its calculation. Keep in mind you
have to make sure that the output of the second piece does not vanish within the delay1 time from
the start of C because otherwise you will get the wrong result at the input of the third piece when
it is clocked in by C2. It is important to note that the clock C2 is not visible to any other circuit i.e.
it is strictly an internal clock and is used exclusively by the second piece of this circuit.
9.10 Clock trees
The clock is global to the chip and is used to time everything in the chip. It is also used
to synchronize everything in the chip. But what if the clock itself is wrong. Maybe the clocks
in different parts of the chip have the correct frequency but the wrong phase. This can happen
quite simply by the effect of the capacitive loading of the clock signal by the interconnect line
capacitance and the gate input capacitance. So it is very possible that three clocks in the chip may
have a clock relationship as shown in the gure 9.18.
In this gure let us call the region using the clock C1 as region C1 and that using the clock C2
as region C2 and so on. What do you think will happen when the output of a logic circuit in region
C1 is used as an input in region C2 or region C3 ? As you guessed it would be utter chaos because
109
C1
C2
C3
C1
C2
C3
Figure 9.18: Three clocks that are skewed.
C1
C2
C3
C1
C2
C3
Figure 9.19: Clock source in the center.
they would be out of sync with each other.
The chip industry uses a standard approach [97], [98] to avoid clock skew. Basically you start
with a master clock. Then you divide the distribution into several levels so that the lowest level a
chip the size of a microprocessor contains perhaps 40 clocks as shown in the gure 9.20.
Having done this, the level 2 is synchronized with the level 1 clock, the level 3 is synchronized
with the level 2 and so on. In order to do this there are many phase detectors. Once the phase
detectors determine whether a clock comparison shows that a clock is ahead or behind the higher
level clock, the deskewing is performed iteratively.
The deskewing of the clocks is done using digital bits that control programmable delays. In
each iteration the bits are incremented or decremented based on the phase detectors result and
the comparison is repeated. This is done continually because the clock skews can have so many
causes.
All this exibility has the additional advantage that the designers can hard code intentional
clock skewing as described in the gure 9.16 to get optimum performance out of the existing
logic.
The clocks that have a particularly high frequency may be doubly controlled wherein both
the rising edge and the falling edge of the clock is controlled. So here you would use two phase
detectors to generate two sets of control bits, one for the rising and the other for the falling edge.
Then you have two programmable delays where the rst deskews the rising edge and the second
controls the width of the pulse.
110
1.2.1
1.1 1.2 1.3 1.4
1.2.2 1.2.4 1.2.3
Level 2
Level 3
1
Level 1
Figure 9.20: A standard clock tree.
Delay
Phase
Detector
Logic
Control bits
Clock out
n
Figure 9.21: Controlling the skew.
111
Chapter 10
Phase-Locked Loops
Phase-locked loops originally become popular in making radio receivers. One of the problems
that all radio receivers have is that the signal that they are receiving shifts slowly back and forth
in frequency because it is modulated by the medium that it passes through when going from the
transmitter to the receiver.
A good reference for phase locked loops is [99]. PLLs are based on a feedback loop so it is a
good idea to also read a more general control systems book such as [100].
10.1 Ring oscillator
A PLL is a very complicated circuit to understand, so let us start slowly and discuss some ba-
sics before we get to the PLL circuit itself. The circuit in gure 10.1 shows four cascaded inverters
with the output of the fourth inverter fed back into the input of the rst inverter. So we know that
A = E, because they are shorted together. Let us suppose that A is logic 0, then B is logic 1, C is 0,
D is 1 and E is 0 which is the same as A. Once such a state is reached it will not change as long as
the power is left turned on, because it is a stable state. Similarly let us suppose that A is logic 1,
then B is logic 0, C is 1, D is 0 and E is 1 which is the same as A. Once this state is reached it too
will not change as long as the power is left turned on, because it is also a stable state.
D E C B A
Figure 10.1: Four inverters looped back.
Now let us modify the circuit by adding an inverter as shown in gure 10.2. The behavior of
this circuit is very different. Let us suppose that A is logic 0, then B is logic 1, C is 0, D is 1, E is 0
and F is 1 but we have a problem now because F is shorted to A and we started with A as 0, and
A cannot be both 0 and 1 simultaneously.
As long as we treat the inverters as pure logic elements, the circuit in gure 10.2 appears to
be impossible, because A must be equal to F and that is not the answer we are getting when we
analyze the circuit. But let us make the situation more realistic by adding a propagation delay to
112
D E C B A F
Figure 10.2: Five inverters looped back.
each inverter. The propagation delay is the time difference between when the input crosses a logic
threshold until when the output crosses the same logic threshold.
Now at time 0s let A be 0. Let the propagation delay of the inverter be 10s. So then at time 10s,
B becomes 1, at time 20s C becomes 0, at 30s D becomes 1, at 40s E becomes 0 and at 50s F becomes
1. So therefore, at 50s A becomes the same as F i.e., A becomes 1. But notice that now we dont
have a problem because if A was 0 at time 0s that still allows it to become 1 at time 50s. After that
A becomes 0 again at 100s, and then 1 again at 150s and so on forever as long as the power is left
turned on. In reality the logic level at the point A will be as shown in gure 10.3. From 40s to 50s
the voltage at A slowly rises from logic 0 to logic 1, and then from 90s to 100s it falls from logic 1
to logic 0, then again from 140s to 150s it rises from logic 0 to logic 1 and so on.
0
1
50 100 40 90 150
Figure 10.3: Logic at A of gure 10.2.
The circuit in gure 10.2 belongs to a family of such circuits called ring oscillators. The ring part
is because the output of the last inverter is fed back to the input of the rst inverter. The oscillator
part is because the voltage keeps going up and down i.e., it oscillates and so it is an oscillator.
The members of this family all have an odd number of inverters in the ring. We already discussed
what happens when you have an even number of inverters in the ring i.e., it does not oscillate.
The period of oscillation is equal to twice the number of inverters multiplied by the propagation
delay of each inverter. So, if you want a ring oscillator which oscillates very rapidly, you need
to reduce the number of inverters in the ring. Believe it or not, a working oscillator has been
built with just a single inverter, however the inverter was not built using FETs but rather with a
different type of transistor called a bipolar junction transistor or BJT. In order for a ring oscillator
to oscillate properly with very few inverters in the ring, each inverter must have an amplication
much higher than unity.
So based on what we know if we build an oscillator with some odd number, say 7 elements,
and if the propagation delay of each inverter is 100 picoseconds, then the period of the oscillator
is 1.4 nanoseconds and its frequency is the inverse of that i.e. 714 MHz. But what if we really
want 700 MHz ? Or perhaps 702 MHz ? Obviously we want a way to modify the propagation
delay so that we can generate different frequencies. Such a circuit is called a voltage controlled
113
oscillator or VCO. In a VCO, the inverter elements have additional control inputs which are used
to vary the propagation delay. Keep in mind that there are many types of oscillators which are not
based on inverters, but for now we will only talk of ring oscillators because they are the easiest to
understand.
10.2 Subsystems
O I Id
Ir
CV
Reference
signal
O I
Lowpass filter
Phase Comparator
Divider
VCO
Output
Figure 10.4: A schematic of a phase-locked loop or PLL.
As you can see in gure 10.4 there are only four components in the PLL namely the VCO, the
divider, the phase comparator and the loop lter.
10.2.1 Voltage controlled oscillator
An inverter whose propagation delay can be controlled is shown in gure 10.5. This is called a
current starved inverter. Another type is the loaded inverter where you vary the load capacitance
attached to the output of the inverter.
I O
Cp
Cn
Cp
Cn
Figure 10.5: A circuit for an inverter with a variable delay.
114
In gure 10.5, the two controls are C
n
and C
p
. These two are a pair i.e. for each value of C
n
there is a unique value of C
p
. So the way you generate C
n
and C
p
is to make a circuit that takes a
single input and generates both C
n
and C
p
. The requirement is that for any pair of C
n
and C
p
, the
maximum current that can be sourced by the upper half of the circuit must equal the maximum
current that the lower half of the circuit can sink. Otherwise the circuit will not be symmetric and
the logic levels will be unusable.
This works by slowing down the inverter by a controlled amount. Adding the series transistors
above and below is a way to reduce the current ow and because the current ow is needed to
charge the gate capacitances of the next inverter in the ring, reducing the current is a way to
increase the propagation delay. So the highest frequency that can be obtained from the oscillator
is obtained when C
n
is the supply voltage and C
p
is ground.
Another way of making a fast VCO is to use an Astable multi-vibrator. Multi-vibrators are the
grand daddys of digital circuits and date back to when digital circuits were just being born. They
are divided in Bi-stable and Astable families. The rst of the two is stable in either of two states
while the latter is stable in neither state and constantly switches states giving rise to a square wave
output.
Vdd
R1
C C
R2 R2
Q1 Q2
A B
O
M
Figure 10.6: A fast VCO based on an Astable multi-vibrator.
At startup both capacitors are discharged. Let us assume that at startup Q1 turns on then the
capacitor on the left charges up. By design R2 is much larger than R1. The voltage at the collector
of Q1 drops substantially and so the voltage at A which is the voltage at the base of Q2 is below
one diode drop above ground. But as the capacitor on the left charges the voltage at A rises above
a diode drop and Q2 starts to conduct. This immediately forces the voltage at O down below a
diode drop above ground. So Q1 turns off and the voltage at A rises because the voltage at M
rises toward Vdd. So while the capacitor on the left discharges and until the capacitor on the
right charges up until the voltage at B rises above a diode drop above ground the transistor Q2
remains on and the transistor Q1 remains off and then Q1 turns on and Q2 turns off and so on
indenitely. The period of the output depends the R1C product so by adjusting the resistance R1
you can control the output frequency.
10.2.2 Divider
The divider is a purely digital circuit. Its function is to take a digital input square wave and
increase the period n times. For example n may be 8. Usually for high performance applications
one does not use a very large value for n. Of course people do assemble PLLs from individually
115
packaged components on a printed circuit board using values of n as large as 2000 but these are
not critical applications.
When you need a very high accuracy output waveform you need a small n of less than 20 to
tightly couple the output to the reference signal. The signal output by the divider is compared to
the reference signal by the phase comparator. If the PLL is locked and is working properly, then
the signal output by the divider should be almost identical to the reference signal. Dividers are
just counters. You can make a pretty decent counter with a D ip-op and nand gates as shown
in the gures 10.7 and 10.8.
Divided output
Clock input
Q
Q
D
Figure 10.7: A divider based on a D ip-op.
Figure 10.8: The output from circuit in the gure 10.7.
10.2.3 Phase comparator
The phase comparator could be either digital or analog in nature. The purpose of the phase
comparator is to compare the reference signal to the divided signal and try to make themidentical.
Actually the loop is what tries to make them identical, but what the comparator does is to point
out whether the divided signal is leading or lagging the reference signal and by how much.
The simplest phase comparator is based on the Set-Reset ip-op as shown in the gure 10.9.
It is just an SR ip-op with nothing else. The reference signal is sent into the set input while the
divided signal is sent into the reset input. They are both active on the falling edge. So the output
of this phase detector is a pulse whose width is the length of time from the falling edge of the
reference signal to the falling edge of the divided signal. The approximate balance point depends
on the VCOs control voltage function but basically you set it at about phase difference between
the reference and the divided signal as shown in the gure 10.10.
The most popular phase detector is called the phase-frequency detector and a simple version is
shown in the gure 10.11. It is called a phase-frequency detector because it locks both phase and
frequency at the same time i.e. when both the reference signal and the divided signal have the
116
Reference signal
Divided signal
Output
S
R
Q
Q
K
J
Figure 10.9: A detector using the SR ip-op.
R
D
O
Control voltage
Figure 10.10: The balance point for gure 10.9 .
117
same frequency and are in phase. Here the rising edge of the reference is used to set one ip-op,
the rising edge of the divided signal is used to set another ip-op and a nand gate is used to reset
them both if the outputs of both ip-ops are high. The upper output is used to control a pull-up
NFET while the lower output is used to control a pull-down NFET.
R
1
R
1
Vdd
D
R
Loop filter
Q D
Q D
Figure 10.11: The phase-frequency detector.
R
D
PU
PD
difference period
Figure 10.12: The case when the VCO is 33% too fast.
In the gure 10.12 you see the output from the phase-frequency detector for the case where
the VCO is running 33% faster than it should be. In this gure you can see that because D is 33%
faster than the reference signal R, 3 periods of R t in the same time period that 4 periods of D do.
In this book I will call this the difference period. In general this period is simply given by:
T
diff
=
1
f
1
f
2
(10.1)
118
In the gure 10.12 for a single difference period it can be seen that the total duration of the
pull-up signal PU is less than the total duration of the pull-down signal PD. So for the durations
of the pull-up signal, the lter capacitor is being charged and for the durations of the pull-down
signal, the lter capacitor is being discharged and so if the the total duration of the pull-up signal
PU is less than the total duration of the pull-down signal over the difference period, then at the
end of the difference period, the lter capacitor has a lower voltage which means that the VCO
frequency is reduced which is what is needed.
10.2.4 Loop lter
The loop lter is a low pass lter. In the gure 10.4 the loop lter shown is a simple RC lter.
It is known as a rst-order lter. In general lters are classied by their order as obtained from
their transfer function as shown in the gure 10.13. The transfer function is used as:
O(s) = f(s) I(s) (10.2)
f(s) =
(s s
1
)(s s
2
)
(s s
3
)(s s
4
)
(10.3)
In the equation above s is j = j 2 f and f(s) is the transfer function. In the transfer
function s
1
and s
2
are called zeros because that is where the transfer function goes to zero and
s
3
and s
4
are called the poles because the transfer function reaches toward its maximum at these
frequencies. The location of the poles is what lter designers have to worry about to keep the
circuit from oscillating. You can make more complicated lters using more capacitors or perhaps
f(s) I(s) O(s)
Figure 10.13: The transfer function of a lter.
even operational ampliers. But in reality you dont want to have too high an order of lter
because it really does increase the probability that the PLL will become unstable and then it is
basically useless. Any designer should be aware that process variations can cause the components
in your circuit to vary by as much as 30% or so and that your circuits should be stable even with
all this random (but usually concerted) variation. The PLL itself is a feedback loop and its order
is (1 + filter order) where the 1 comes from the VCO because the phase of the VCO is a perfect
integral of the control voltage applied to it and integration counts as a pole. The Fourier transform
of the integral sign is
1
s
.
10.3 Loop operation
So the way that the loop works is that you release the PLL so that its starting VCOcontrol volt-
age is lower than it should be. So the starting frequency is lower than it should be. So the divided
frequency is lower than the reference frequency. So the control voltage needs to be increased until
the divided signal matches the reference signal in frequency and phase. We already discussed the
case when the VCO was too fast as shown in the gure 10.12, and in the gure 10.14 you see the
opposite case when the VCO is too slow.
119
R
D
PU
PD
difference period
Figure 10.14: The case when the VCO is 25% too slow.
As you can see in this gure the total duration of the PUsignals is larger than the total duration
of the PD signals over the course of one difference period. So at the end of each difference period
the voltage of the lter capacitor is slightly higher thus speeding up the VCO until such time that
the PLL is locked.
When the VCO is very far from where it should be, you want a quick pull-in to the desired
frequency. The way to do this in this case is to simply reduce the RC time constant. But once the
VCOgets close to lock a small RCtime constant causes a larger ripple in the VCOfrequency which
is undesirable. You can see this ripple in the VCO control voltage shown in the gure 10.10.
For this reason many PLLs have two modes for the loop lter, one mode provides lots of
correction and is used to lock the PLL in the rst place and the second mode provides a much
larger RC time constant and therefore a slower correction to damp the oscillations in the VCO
frequency. Some people simply use a large RC time constant such as by using a large capacitor
but this can backre because the PLL may simply not lock at all because the pull-in will be so
slow that it is simply drowned out by other causes of frequency variation such as power supply
uctuation or other coupling in of signals perhaps by capacitive coupling between interconnect
lines or other causes.
10.4 Delay-locked loop
A delay-locked loop or DLL is very different from a PLL even though they are often used in
the same applications. The difference is in the way the signal is generated. In the PLL the VCO
was free-running and the only control that the loop had on the phase and the frequency of the
VCO output was by raising or lowering the control voltage of the VCO. A DLL does not use a
VCO. Instead a DLL uses variable delay sections. If the n you wish to use is 5 then you will have
5 delay sections. The way a DLL works is shown in the gure 10.15.
In the gure 10.15 the reference signal is used in two places. It enters the rst delay section
at the bottom left and proceeds toward the right going through each of the 5 delay sections. The
reference signal R and each of the outputs P1 thru P5 are shown in the gure 10.16. Each of the
5 phases P1 thru P5 are delayed by a fth of a cycle w.r.t the previous phase. Note that the fth
phase P5 is identical to the reference signal R except that it is delayed by a whole cycle. This is
used to lock the DLL to the reference frequency.
The Edge comparator in the gure 10.15 is used to compare the rising edge of the reference
120
Lowpass filter
CV
Id
Ir
Edge comparator
CV CV CV CV CV
P1 P2 P3 P4 P5
R
R
Figure 10.15: A delay locked loop.
signal R to the rising edge of the fth phase P5. If P5 rises after R then the delays are too long and
they need to be reduced by speeding up the delay sections by increasing the control voltage CV. On
the other hand if P5 rises prior to R then the delays are too short and they need to be increased by
slowing down the delay sections by decreasing the control voltage CV.
R
P1
P2
P3
P4
P5
Figure 10.16: The phases of the DLL.
Keep in mind that the DLL output is not the same as the PLL output. The PLLs VCO outputs
a nice clean waveform of frequency n R however in the case of the DLL you need to construct
this output waveform using the rising edges of P1 thru P5. Since whatever digital logic you use
to construct the output waveform of frequency n R has delays of its own, the output waveform
is not a clean square wave as in the case of the PLLs output. For this reason DLLs are often used
when the frequency required is not super high or alternately when you can use the outputs P1
thru P5 directly to control your circuitry. If you do this remember to consider the effect of the
121
loading of P1 thru P5 and make sure they are identical to each other otherwise the phases will be
skewed and will not represent exactly a fth of a cycle phase difference w.r.t each other.
But in fact the increased transparency of the DLL behavior and the increased reliability makes
a lot of designers use the DLL in place of the PLL. Keep in mind that there is no concept of the
difference period here because the rising edges of P1 thru P5 are based on the rising edge of the
reference signal.
10.5 Tracking and re-sync PLLs
LPF
LPF
Antenna
Multiplier
VCO Detector
Pure carrier signal
Demodulated signal
90
o
Figure 10.17: Phase-locked loop used in radio reception.
Often all you are really looking for is a way to track an incoming signal and maintain a lock on
it. The earliest use of this was in radio FMreceivers. In the gure 10.17 the signal fromthe antenna
is fed into the detector and the multiplier. The other input to the detector comes from the VCO.
The output of the detector passes through a low pass lter and becomes the control voltage for the
VCO. The reason you dont have a divider is that you are trying to extract the carrier frequency.
So the output of the VCO is both phase and frequency synchronized with the carrier. So if it is
phase shifted by 90
o
and multiplied with the antenna signal and passed through a low pass lter
you will get the demodulated signal. The use of PLLs in radio receivers was really the application
that drove the development of PLLs for a long time.
122
Chapter 11
Digital Signal Processors
Digital Signal Processors are similar to microprocessors except they perform a more specic
purpose. They are used to process digital signals. If you take a time domain waveform and
digitize it at 1 GHz using a 16-bit analog to digital converter, you will get a sequence spaced 1 ps
apart with each sample containing 16 bits.
Since a DSP is embedded in the device and performs only one function, it is expected to do it
in a known time, usually real time. Because it is dedicated to a function it performs you would
select different DSPs for different purposes. In addition DSPs have specialized circuits to perform
tasks that affect performance in hardware instead of software.
11.1 Fourier transform
Signal processing is understood in the frequency domain and the Fourier transform is the way
to convert a time domain waveform into a set of frequencies, amplitudes and phases. If you take
the Fourier series representation of a series of rectangular pulses you will get equation 11.1. If
you use the rst 1, 3, 9 and 30 terms the pulse will look as shown in the gures 11.2.
1

(sin 1 cos x +
sin 2 cos 2x
2
+
sin 3 cos 3x
3
+. . .) (11.1)
0.4
1
0.2
1.2
0.2
1.2 1.2
0.2
(11.2)
The Fourier transform and the inverse Fourier transform are dened by the equations 11.3
and 11.4. A good book on the Fourier transform is [101]. The Fourier transform is different from
123
the Fourier series in that the output terms are complex.
F() =

f(x) e
ix
dx (11.3)
f(x) =
1
2

F() e
ix
d (11.4)
It is generally accepted that a time domain waveform that is measured will always have a
Fourier Transform. The time domain waveform is a scalar i.e it is not complex. The Fourier
Transform is complex, but keep in mind that if you want the inverse Fourier Transform to be a
scalar then although you can attenuate the phasors representing the Fourier Transform terms, you
cannot change their angle because otherwise the inverse Fourier Transform will not be a scalar.
The reason the inverse Fourier Transform is scalar is that
e
ix
e
ix
= 1 (11.5)
In order to use either time domain or frequency domain information you have to discretize
it. Basically you have to cut continuous functions into slices and treat each slice as a unit. So the
Discrete Fourier Transform or DFT is used and it is dened as:
F() =
1
N
N1

= 0
f() e
i 2 (

N
)
(11.6)
f() =
N1

= 0
F() e
i 2 (

N
)
(11.7)
If you start with N time slices then you will get complex amplitudes at N frequencies. The
lowest frequency is the inverse of the total time for which you sampled. The zeroth frequency of
course is the constant term. The method used to compute the DFT is the Fast Fourier Transform
or FFT [102]. This method requires a computation time of 2 N log
2
N and gives substantial com-
putation time reduction for large N. This method also extends to the two and three dimensional
cases.
The method used in this algorithm is based on [103]. So the DFT can be split into two DFTs of
half the size made up of the even and odd sequences as in the equation 11.8.
N1

=0
e
2i/N
f

=
N/21

=0
e
2i(2)/N
f
2
+
N/21

=0
e
2i(2+1)/N
f
2+1
(11.8)
=
N/21

=0
e
2i/(N/2)
f
2
+ e
2i/N

N/21

=0
e
2i/(N/2)
f
2+1
(11.9)
The FFT speeds up the DFT by recursively dividing it in half as shown in the gure 11.1.
Then you perform the DFTs of each subsection and then multiply the odds by a constant before
adding them to the evens. In each case the denition of odd is based on the previous level. For
example 3 is an odd when you divide the sequence the rst time, but it is an even when you divide
it the second time then it becomes an odd for the third division etc. Because of this division of the
sequence, the time domain sequence must have a length of a power of 2 i.e. 2, 4, 8, 16 etc. In cases
where this condition is not met, the sequence is increased in length by adding trailing zeros until
the condition is met. However this does have some side effects.
124
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
2
4
6
8
10
12
16
14
4
8
12
16
2
6
10
14
1
3
5
7
9
11
13
15
3
7
11
15
1
5
9
13
1
9
5
13
3
11
7
15
2
10
6
14
4
12
8
16
Figure 11.1: Successive reduction of the sequence.
The Nyquist theorem states that if you have a time-domain sequence with samples taken T
time units apart then the maximum frequency you can represent using this stream is given by:
f
max
= f
N
=
1
2T
(11.10)
So this means that if you wish to transmit a signal representing 10 kHz then you need to sample
at 20k Samples/second.
actual frequency
alias at lower frequency
sample points
Figure 11.2: Aliasing due to low sampling rate.
Aliasing occurs when the sampling rate is too low. In other words the physical waveform you
are sampling contains frequency components higher than f
N
. But these frequencies can give the
illusion of being a lower frequency as shown in the gure 11.2.
As you can see in this gure the actual frequency is higher than the Nyquist criteria requires.
But when you sample it at a low frequency you still get the up and down variation of the signal
at a much lower frequency. In this case the alias appears at about a third of the original frequency
and it has quite a signicant amplitude, in fact it has the full amplitude of the original frequency.
125
LPF FFT A to D
converter
Figure 11.3: The use of an anti-aliasing lter.
This is a problembecause when you take the FFT of this waveformafter sampling it too slowly,
you will believe that the alias is real i.e. you wont know that it is an alias. For this reason some
systems use an anti-aliasing low pass lter with a cut-off frequency of f
N
as shown in the gure
11.3. This removes the problem and in this situation you will not see the alias anymore.
1
f
Amplitude
ripples
Ideal filter
N
Figure 11.4: Frequency response of anti-aliasing lters.
Over-sampling is a technique used to remove problems such as aliasing among other things.
It is a better way to x the problem of aliasing than the anti-aliasing lter. The problem with
the anti-aliasing lter is shown in the gure 11.4. If you use a simple lter you get the smooth
curve (the lower one). The problem with this curve is that there is signicant loss of amplitude
(it is supposed to be 1) near the Nyquist frequency f
N
. If you increase the order of the lter the
amplitude improves but you get ripples as shown in the upper curve. So whether you use a low
order or a high order lter, the amplitude of the signal is wrong near the cutoff frequency f
N
.
Over-sampling is a solution to this problem. Instead of setting the cutoff frequency of the low
pass lter at f
N
you set it at 2 f
N
or higher perhaps even 8 f
N
. By doing this the amplitude is
even all the way upto and beyond f
N
. Now you sample this ltered signal at 2x or 8x the original
sampling rate. So these would be 2x and 8x over-sampling respectively. Then you use digital
techniques to throw away the frequencies higher than f
N
. Because your sampling rate is several
times higher than it should be there is no aliasing below f
N
and although the amplitudes above
f
N
are wrong you dont care because you will throw them away anyway.
Window functions are used to remove the effect of the edges of a time domain sequence. Since
the Fourier transform assumes that the time sequence repeats, you will get the best results if you
sample an integral number of cycles of the time waveform. Since this is easier said than done it
is not normally the case. So window functions are used to mitigate the effects of the data points
near the beginning and the end of the time sequence.
The most popular window functions are the Hamming, the Hanning and the Kaiser. You can
also use a Gaussian. The general shape of the window function is as shown in the gure 11.5. It
126
Figure 11.5: The basic shape of a window function.
will be multiplied by the time sequence before the FFT is performed. Even this windowing will
cause distortion but you make a choice between two evils. As a general rule you are better off
with a time sequence of several cycles instead of just one.
11.2 Compression
Compression is used anywhere that digital information needs to be transferred from one lo-
cation to another and the speed of the link is the limiting factor. A cellphone is one place where
compression plays an important role as shown in the gure 11.6.
A to D
converter
DSP
converter
D to A
receiver
transmitter
Figure 11.6: A simplied cell phone.
The antenna signal comes in fromthe antenna at the carrier frequency and the receiver extracts
the baseband signal from it. The analog to digital converter converts the baseband to a digital
signal. The DSP processes that and sends it to the loudspeaker. It also takes the signal from
the microphone converts it to analog using the digital to analog converter and transmits it after
upshifting the baseband to the carrier frequency.
In the gure 11.7 you see what the DSP does in the transmit path. The bit stream coming in
from the left is coming into the DSP twice as fast as it is going out of the DSP to the right. So
the compression is 2. So in any time period say 1 second if 8k bits enter from the left only 4k
bits are going out on the right. Now unlike a picture image this cannot be lossy i.e. when the
bit stream is decompressed after receiving it, you should have exactly the identical stream to that
before compression. If it was just voice it might not matter, but of course nowadays cellphones are
used to connect computers to the internet so data has to be transmitted accurately.
127
transmitter
converter
D to A
DSP
n bits/sec 2n bits/sec
Figure 11.7: The DSP compresses the bit stream.
4bits
out
4bits 5bits 5bits 4bits
in
8bits 8bits 8bits 8bits 8bits
Figure 11.8: Huffman compression of data.
Huffman coding [104] is a very popular compression scheme. The way this is accomplished is
shown in the gure 11.8. In this gure the data bit streamis entering on the left and is divided into
chunks of 8 bits. So if you look at each 8 bit chunk, it can only have one of 256 values. Now if the
data has a pattern to it, for example if it is a text le, then some of the 256 values will occur more
often than others. The alphanumeric text is 62 characters which means that most of the remaining
256 62 = 194 characters are rarely used.
In addition the letter "a" will occur very often whereas the letter "q" should occur much less
often. So in Huffman coding you assign a 3-bit number for the letter "a" and you assign a 6-bit
number for the letter "q". Since you only send 3 bits when you send the letter "a" and 6 bits when
you send "q" and you may send a 10-bit number to represent "@", the compression you get is very
signicant. So Huffman compression works very well on text les.
It also works quite well on voice signals because voice signals are not random. The human ear
can hear sounds in the range from 20 Hz to 20 kHz. But the human voice box does not use the
entire spectrum uniformly during speech. For example a womans voice uses a higher frequency
range than does a mans voice. Also different languages use phonetic sounds differently. So for
this reason some bit sequences repeat more often than others. So Huffman codes work quite well
in a cellphone.
Since you use different bit lengths to represent different 8-bit words, how does the decompres-
sion process work. How do you know when you have a valid sequence of bits to convert back
into the 8-bit word. The answer is to make a tree. A binary tree is a way to do sorting and in this
case to look up the meaning of a code as shown in the gure 11.9. In this gure you start at the
top. If the rst bit is a zero you go left if it is a 1 you go right. The diamonds are decision blocks
representing this choice. The parallelograms are actual codes so they are dead ends. When you
reach the parallelogram it means you have deciphered a letter and you can go back to the top
128
Start
0
1
0 1
1 0
0 1 0
1
0 1 0 1
"a" "i"
"s"
Figure 11.9: A Huffman decoding scheme.
and start again. This particular case is not really a sorting operation but in general the speed of a
binary tree is very high and is of the order log
2
N where N is the number of objects to search.
11.3 Digital Filters
If you wish to lter a time domain signal with a lter you could take the FFT of the signal
multiply it by the lter and then take an inverse FFT. But doing FFTs in a DSP is computationally
expensive. Convolution is a way to do this in the time domain. Convolution in the time domain
is multiplication in the frequency domain. Convolution is done by the equation 11.11.
C(t) =

A(u) B(t u) du (11.11)


So this equation gives you the value of the convolved output C(t) at each value of t. Now let
us limit B(t) in order to get rid of the signs. So if B(t) is non-zero between say 5 and +5
and is zero everywhere else then you only need to integrate between 5 and +5, so the equation
becomes 11.12 and if you discretize it it becomes equation 11.13.
C(t) =

t5
t+5
A(u) B(t u) du (11.12)
C
n
=
5

k=5
A
n k
B
k
dk (11.13)
Filtering is the most time honored task normally performed by DSPs because it used to be done
on input signals well before DSPs became possible. Signals used to be ltered using specialized
electronic circuits. Names like Butterworth or Chebychev lters are well known to analog, radio
frequency or microwave circuit designers. These lters were designed by order of the lter mean-
ing that a higher-order lter had higher order polynomial terms allowing a more precise ltering
that allowed both better blocking of unwanted frequencies and better pass through of wanted
frequencies.
The gure 11.10 shows a Finite Impulse Response lter implementation. There are many
references [105], [106] to obtain the coefcients b
n
. They are based on the convolution integral of
equation 11.11. You start by dening the frequency response H() of the lter in the frequency
domain. Then you take its inverse FFT. Now you will get many terms and you select the set you
want to use. And this set is the set of numbers b
n
. The more terms you use the better the ltering.
129
+ + + + +
b0 b1 b2 b3 b4 b5
x(n)
y(n)
delays
Figure 11.10: An FIR lter.
Because the summation of products as in equation 11.11 is very common usage of the DSP, it
normally has both accumulators and a Multiply and Accumulate (MAC), in addition to more than
1 ALU. An accumulator just adds whatever you give it to a summation, whereas a MAC multiples
two numbers and gives the result to an accumulator.
11.4 Pattern recognition
Figure 11.11: Digitized intensity pattern.
The cross-correlation between two processes f(t) and g(t) is given by
CC
fg
(t
1
, t
2
) = E f(t
1
)g

(t
2
) (11.14)
[107] suggested an easy way of comparing images of equal size and number of samples. First
take the FFT of both images. Take the complex conjugate of one of the FFTs and multiply it by
the other FFT. Then take the inverse FFT of the product. This is the cross-correlation of the two
images. If you nd a peak in this cross-correlation that gives you the displacement at which the
correlation is highest.
For example suppose you wish to recognize the letter B as shown in the gure 11.11, you
would rst place the grid on the left over the letter and average the black part over the area of the
squares. Since you are going to use the FFT, the dimensions need to be a power of 2. Then you
would use the technique above to detect a "B" pattern in another image of the same size. Because
intensities vary you would have to normalize them rst.
130
11.5 Error correcting codes
Error correcting codes are not compression schemes. Here the number of bytes after coding
increases and the excess bits are called "redundant". The purpose of error correcting codes is to
transmit information reliably even in the presence of noise that corrupts a few bits in the received
data. The most popular ECCs are Reed-Solomon coding and decoding, Convolutional coding and
Viterbi decoding. There are other ways of decoding convolutional codes but the Viterbi decoding
is the most efcient.
11.5.1 Reed-Solomon code
Reed-Solomon codes [108] convert a sequence of mn bits into a sequence of n2
n
bits. So if
n = 3 and m = 3 then a block of 9 bits will become a block of 24 bits, with each m bits having a
corresponding 2
n
bits. At least (2
n
m1)/2 bits can be corrected so if n = 3 and m = 3 then any
0, 1 or 2 bits within each block can be incorrect.
The theory of Reed-Solomon codes is rather complex so the following is just an example
slightly different from that in the original paper [108]. We start by picking n = 3 and m = 3.
Then we dene a polynomial function f(x) = x
3
+ x
2
+ 1. We use this to generate a recurring
sequence [109] using the difference equation a
n
= a
n1
+ a
n3
which is used for n = 3, 4, 5 . . .,
with modulo-2 addition which is 1 + 1 = 0.
Now we choose a
0
= 1, a
1
= 1 and a
2
= 0. The set of a
n
= (1, 1, 0, 1, 0, 0, 1) and after this it
repeats so that a
7
= a
0
, a
8
= a
1
and so on. Now you dene a table as
0 = (0, 0, 0)
= (a
0
, a
1
, a
2
) = (1, 1, 0)

2
= (a
1
, a
2
, a
3
) = (1, 0, 1)

3
= (a
2
, a
3
, a
4
) = (0, 1, 0)

4
= (a
3
, a
4
, a
5
) = (1, 0, 0)

5
= (a
4
, a
5
, a
6
) = (0, 0, 1)

6
= (a
5
, a
6
, a
7
) = (0, 1, 1)

7
= (1, 1, 1)

8
=
7
=
(11.15)
Now you dene another polynomial of degree m 1 as P(x) = b
0
+ b
1
x + b
2
x
2
. The code is
then the correspondence of the equation 11.16 where (0, 1, 1) + (1, 0, 1) = (1, 1, 0) and so on.
(b
0
, b
1
, b
2
) (P(0), P(), P(
2
), P(
3
), P(
4
), P(
5
), P(
6
), P(1)) (11.16)
Now if you wish to send the sequence (101001100) that is (
2
,
5
,
4
) and P(x) =
2
+
5
x +

4
x
2
so you will get the sequence (
2
), (
2
+
6
+
6
), (
2
+
7
+
8
), (
2
+
8
+
10
), (
2
+
9
+

12
), (
2
+
10
+
14
), (
2
+
11
+
16
), (
2
+
5
+
4
) i.e. the output sequence is
(101001100) (101, 101, 100, 001, 001, 000, 100, 000)
In order to decode the message the a
n
needs to be determined for the equation P(x) = a
0
+
a
1
x+. . . +a
m1
x
m1
. So any m = 3 equations need to be solved from the 2
n
= 8 equations 11.17.
The Ps are the incoming data. The polynomial f(x) is known. By solving different sets of m you
should get the same a
n
. The a
n
you get often is the correct set. So if for a set of myou dont get the
131
same a
n
then one of the Ps is incorrect and by using the a
n
obtained from another set of m you
can obtain the correct P value.
P(0) = a
0
P() = a
0
+a
1
+a
2

2
+. . . +a
m1

m1
P(
2
) = a
0
+a
1

2
+a
2

4
+. . . +a
m1

2m2
. . .
P(1) = a
0
+a
1
+a
2
+. . . +a
m1
(11.17)
11.5.2 Convolutional coding and Viterbi decoding
An example of convolutional coding [110] is shown in the gure 11.12. There are two output
bits O1 and O2 for each input bit X. Because of the delays the last four input bits are used to
compute O1 and O2. The adders shown are modulo-2. The encoder can be dened as the equation
11.18.
+
D D D + X a
b
Figure 11.12: An example convolutional encoder.
a = x
n3
+x
n2
+x
n
b = x
n3
+x
n1
(11.18)
At the end of the sequence three trailing zeros are added until all the output bits are computed.
Usually the tree for the encoding system is written as shown in the gure 11.13.
1
0
0
1
0
1
ab
ab
ab
ab
ab
ab
Figure 11.13: Tree code diagram.
In the gure 11.13 the data bit is shown below each branch while the corresponding coded
bits are shown above the branch. If the number of delays used is K then there are K +1 branches.
132
Each branch will have one data bit and set of coded bits except the last branch which has K data
bits and K sets of coded bits to account for the K 1 trailing zeros. So all possible states of the
encoder are covered.
x
n1
x
n2
x
n3
Output x
n
x
n1
x
n2
000 00 000
001 11 000
010 10 001
011 01 001
100 01 010
101 10 010
110 11 011
111 00 011
000 10 100
001 01 100
010 00 101
011 11 101
100 11 110
101 00 110
110 01 111
111 10 111
Table 11.1: Data for the trellis diagram.
Viterbi decoding [111] is used to decode messages created by convolutional encoding. The
encoder is rst depicted by a trellis diagram which looks as shown in the gure 11.14. So for the
table 11.1 there are 9 states on the left and right and they are connected by lines with the output
bits in the center. Now if you received the code 01 then it could only be one of the four transitions
011 001, 100 010, 001 100 or 110 111.
Xn1 Xn2 Xn3 Xn Xn1 Xn2
ab
ab
ab
Figure 11.14: A trellis diagram.
So in Viterbi decoding the previous states x
n1
x
n2
x
n3
which were already decoded are used
to decide what the next bit should be. In the presence of noise the coded bits that you receive may
be wrong. So suppose that 01 was sent but 00 was received. Then it has to be one of the four
transitions 000 000, 111 011, 010 101 or 101 110. But suppose the bits already decoded
show that x
n1
x
n2
x
n3
is 011. Since this does not t any of the known transitions, the next
incoming code is checked as well and the most probable value that should have been received is
used.
133
11.6 Motor control
DSP Motor
Transducer
Figure 11.15: Motor control with a DSP.
There are many reasons why motors need to be controlled. During startup a motor may need
more torque to begin the rotation. If there are sudden load changes the current may have to be
increased or decreased to compensate. Frictional forces may vary with temperature. So DSPs are
often used to control motors as shown in the gure 11.15.
AC in
S2
S1 S3
S4

+
Figure 11.16: DC motor control.
Both DC and AC motors can be controlled by a DSP. With a DC motor only the current needs
to be controlled. This is done using a bridge circuit as shown in the gure 11.16. For the case
when the motor needs to be driven in the forward direction, S1 and S4 are high while S2 and S3
are low.
AC in To motor
Figure 11.17: Three phase motor control.
But if you wish to control the speed then you need to apply a pulse width modulated signal to
S4. When S4 is off, current will still need to ow because of the inductance of the motor coils and
for this case the diode across S3 turns on. So the diodes across the transistors are called "ywheel"
134
diodes because the motor spins due to its induction much as a ywheel does due to its moment
of inertia.
With an ACmotor the phase and phase factor has to be controlled as well as shown in the gure
11.17. The pulse width modulation signals are used to drive the bases of the control transistors.
In this case the phase relationship between the PWM signals makes a lot of difference because
the three phases of the motor have to be 120
o
apart and so the induction needs to be balanced
so that the pull of the motor is smooth. When the current is not balanced it is somewhat like an
unbalanced load and the motor will start to vibrate.
So the DSP is used in a standard feedback circuit to control the motor as in the gure 11.15.
As the motor coils rotate in the magnetic eld produced by the xed poles each has its own back
emf and so in cost sensitive applications there is no position transducer and instead the voltage
and current patterns through the coils are analyzed to determine how fast the motor is rotating
and what its position is.
The speed is usually what is controlled and the degree of control can make a lot of difference.
For example imagine a dvd player where the laser beam reections are being read by a detector
digital circuit. The appearance of the ovals under the laser beam has to synchronize with the clock
of the digital circuit. The more precise the speed control then the smaller the ovals could be.
So in fact the actual conguration of the DSP loop depends a lot on the specic application.
But the general idea is always the same which is that the pwmsignal output fromthe DSP controls
the motor and a feedback of some kind is fed into the DSP and processed to generate the pwm
signals, so just as in a control system you can write transfer functions and analyze the stability
and perform the suitable design.
135
Chapter 12
I/O circuits and pcb interactions
Printed Circuit Boards or PCBs arrived on the technological scene well before micro-chips did.
The original radio receivers were built using vacuum tube triodes and a lots of discrete resistors,
capacitors and inductors. PCBs were invented to make these types of circuits more reliable and
easier to assemble. For the person who is assembling the circuit, it is denitely easier because
all you need to know is where on the PCB to put each component and then you solder all the
connections and then you are done.
Nowadays PCBs contains as many as a dozen or more layers. The reason is that when you
use more layers to route with, you can reduce the area of the board, so even if you pay more for
extra routing layers you are paying less for a smaller area board. In addition by increasing the
number of routing layers, you can reduce the length of the interconnections and that will improve
the performance and the improvement in performance may be worth the extra cost. So the board
designers trade off the cost of the board against the performance they want and the space available
for the board to t in.
Most PCBs nowadays use surface mounted components so there is no need to make holes to
place the components. The component pins are at against the board and are held by solder of
less than a square mm. But you still make holes to create connections between the interconnects
on different layers.
12.1 Design consideration
There are basically only a fewimportant characteristics that are important to making and using
high quality boards and they are:
1. Capacitive loading
2. Transit time
3. Line impedance
4. Electro-static discharge or ESD
5. Line drivers
6. Line terminations
7. Impedance variation
136
8. Spread spectrum technology
9. Cross coupling
10. Antenna effect
11. Ground bounce
12. Ringing
12.1.1 Capacitive loading
The capacitive loading of the line is the most important characteristic because it eats up so
much of the IO drive. Board level interactions with long PCB lines is usually done at low frequen-
cies of under 100 MHz. Even a 100 MHz signal translates to a 10 ns period. A PCB line is usually
about 15 ps per inch, so a 10" trace is probably about 150 ps long, i.e. well under the period. So
a signal driven down a line as a high is often expected to be held for the entire 10 ns cycle. Even
assuming that the input at the end of the line is not sinking signicant current, the capacitance of
the entire line needs to be charged. Of course, some outputs may drive many inputs so you have
to add up the capacitance of all these lines. You can actually reduce this if you dont care about the
third item on this list which is the line impedance, because if you dont have a ground or power
plane close to this line, then your capacitance reduces. Of course in this case you will still have the
sum of the parasitic capacitances to all lines running nearby or crossing underneath, which are at
varying voltages that are time-dependent. For low speed signals you dont care about ground or
power planes or about line impedance, you just use a driver than can drive the load. In gure 12.1
you can see three traces A, B and C. There is a large capacitance between A and B because they
overlap each other all the way, but because C is perpendicular to both A and B, their capacitance
to C is much smaller and as we discussed in the rst chapter is proportional to the area of overlap
and inversely proportional to the distance between the conductors. So the parasitic capacitances
have to be factored into your circuit simulation as shown in gure 12.2. In this gure, although it
is not obvious, assume that A is above B and C is below B, so we can draw one parasitic capacitor
between A and B and another between B and C.
A
B
C
Figure 12.1: Overlapping PCB traces.
12.1.2 Transit time
The transit time becomes important for high speed lines running at 100 MHz or faster. In its
simplest formtransit time effects could effectively reduce the period of each cycle and apply stress
137
B
A
C
Figure 12.2: Parasitic capacitances of the PCB traces.
on some portions of the circuit. Imagine a situation as shown in gure 12.3. There are two chips on
the board, one is sending and the other is receiving. The both use the same clock C. In gure 12.4,
the setup time that is available is the rst half of the clock cycle. During this time the sender has
to drive the line capacitance and the load at the input of the receiver and pull the line high. Now,
when you order a part you get a range from fast to slow i.e. you may get a sender that drives a lot
of current, and so it is fast, on the other hand you may get a sender that drives the minimum spec
current and requires almost the entire setup time to pull the line high. In fact even the input of the
receiver will have some variation and you may get a receiver that has a small input capacitance
and a small diode current or you may get a receiver with a large input capacitance and a large
diode current. So both the sender and the receiver decide how long a setup time is needed. Now
in gure 12.4, the signal F is the situation when the driver and receiver are fast. Here the setup
time used is small and even after delaying the signal by 150 ps to get DF, the line has already been
pulled high well before the evaluate begins. On the other hand, for the case where sender and
receiver are slow S, the signal S is pulled high just before the evaluate, however the 150 ps delay
added by the line transit time causes the setup time to exceed a half cycle and so the signal DS
is not ready to be evaluated when it should be. So what this means is that the designer needs to
expect the setup time to be reduced by the transit time.
150 ps
Receive Send
C
Figure 12.3: Logic circuit with long data line.
12.1.3 Line impedance
If the signals you wish to transmit are high speed and if timing is critical then designers always
chose to use transmission line style routing. What this means is that there is a ground or power
plane above or below your trace. This type of transmission line is called a microstrip. In fact the
PCB traces are not genuine microstrips. The reason is that a microstrip has one side open and
138
C
S
F
DF
DS
Evaluate
Figure 12.4: The effect of transit time delay.
a ground plane on the other side, but in a real life PCB, although you may have a ground plane
below(or above) the trace, above it on the next level you may have other traces running parallel to
your trace and these other lines will affect the impedance. Also the effect is not that high but you
will get some coupling from the traces on either side of your trace. There are many good books
written about transmission lines so we will just talk about things we are interested in.
A transmission line transmits a signal. In the gure 12.5 the upper circuit shows what a slice
of transmission line looks like. So if you take a slice half that length just halve the inductance,
capacitance and resistance. Normally we ignore the resistance because if it is large enough to con-
sider then we open ourself to many other problems, so just assume that the thickness and width
of the copper trace is sufcient to give a very low resistance. So we are left with the inductance
and the capacitance. The inductance is what gives the transmission line directivity. It is not that
inductance is special compared to capacitance but it is just that the capacitor in the equivalent
circuit is in parallel or shunt conguration so there is no difference between left and right, but the
inductance is in series conguration so it does know whether the current is owing in from the
left or the right. We have already discussed inductors and what we know is that the current can-
not change abruptly and that the current that ows is proportional to the integral of the voltage
applied and is in the direction from positive to negative voltage.
Output (Input) Input (output)
Figure 12.5: The equivalent circuits of a transmission line.
So assume that the left end of the transmission line is pulled high as shown in gure 12.6. The
voltage V1 is initially 0 i.e. it is at ground potential. This is because the voltage across a capacitor
(in this case C1) cannot change instantaneously. The entire supply voltage is applied across L1.
As time passes the current through L1 increases from 0 as the integral of the voltage across it. The
139
voltage V1 also increases from 0 toward supply voltage as the current through the inductor L1
charges the capacitor C1. As V1 rises above 0, the voltage across L2 increases from 0 and therefore
the current through L2 increases from 0 and therefore the voltage V2 rises from 0 as the current
through L2 charges the capacitor C2. This sounds like V1 gradually increases from 0 to supply but
in fact that is not true. Remember that for the perfect transmission line the equivalent circuit of
gure 12.5 really consists of an innite number of LC sections that are innitely small, so in fact
we need to think of L and C as reducing to smaller and smaller numbers, so the charging of C1
is essentially instantaneous and therefore the charging of C2 is essentially instantaneous as well
and so on. So to sum it up, any voltage you apply to one end of the line propagates to the other
end of the line. Now keep in mind another fact, namely that just as you cannot start the current
ow through L1 instantaneously, you cannot stop the current ow through L1 instantaneously
and this property of inductance is what causes the signal to propagate from one end of the line
toward the other, in this case from left to right. This also means that if you turn off the PFET that
was pulling up the left end of the line, the wave that you initially started continues to propagate
toward the right. So if the line in gure 12.6 is 150 ps long and you turn on the FET for 25 ps,
then a 25 ps pulse travels from left to right. In other words the voltage waveform has a start and
a nish. The rate at which the pulse is propagated is proportional to

LC and the impedance
offered is proportional to

L/C. The most common impedance that is used for PCB traces is the
50 standard. However occasionally designers use 75 as their design point as a way to increase
the speed by reducing the current required.
L1 L2
C1 C2
V2 V1
Propagation
Figure 12.6: Propagation along a transmission line.
What happens when the signal reaches the other end of the line. Well, it has to transfer all
its energy to the load. We have already discussed the maximum power transfer theorem and we
know that the load has to match the impedance of the transmission line in order for the transmis-
sion line to simply transfer its energy to the load. If the load does not match the impedance of
the transmission line, then the excess energy has to be reected back as another wave. If the load
impedance is higher than that of the transmission line, then the reected wave has the same po-
larity as the incident wave i.e. a positive pulse is reected as a smaller but still positive pulse and
a negative pulse is reected as a smaller but still negative pulse, however if the load impedance is
smaller than that of the transmission line, then the reected wave has the opposite polarity i.e. a
positive pulse is reected as a smaller negative pulse and a negative pulse is reected as a smaller
positive pulse.
140
12.1.4 Electro-static discharge or ESD
Due to chemical reactions that take place in our body and also due to static electricity we pick
up as we move, all our bodies have built up electrical imbalances with our surroundings. When
we touch any object, there is a transfer of charge. The energy is stored in capacitances. When we
touch the pins of a micro-chip, charge ows between us and the circuits connected to those pins.
The time that this a problem is when the pin is connected to a gate inside the chip because the gate
oxide is designed for use with very low voltages of about 2.5 V and the transfer of charge could
easily create a voltage far in excess of 2.5 V and at these high voltages the charge just crosses the
gate oxide and redistributes itself inside the chip and in doing this will destroy that gate oxide.
To protect against this happening, almost all pins in micro-chips that we buy are protected
by ESD circuits. The primary circuit is as shown in gure 12.7. It is comprised of two back to
back diodes reverse biased from the inputs to the supply and ground connections as shown in the
dotted box. When the pin voltage drops more than a diode drop lower than ground potential or
more than one diode drop larger than supply voltage Vdd the reverse biased diode will turn on
and conduct the incoming charge safely to the supply or ground. Of course for a chip that is not
connected to anything the ground or supply potential will be a oating potential but even so the
protection will work because everything is referenced off that voltage whatever it is.
Pin
Vdd
Ground
CMOS Logic
Figure 12.7: Pin showing ESD protection.
12.1.5 Line drivers
The simplest output circuit or line driver is basically a set of approximately 4 or 6 inverters
cascaded, with an ESD circuit slapped on at the output. The ratio of the width of the drivers
increases at as close to e as possible, where e is 2.71. So:
W4
W3
=
W3
W2
=
W2
W1
e
Another characteristic is that the ohmic resistance of the PFET and the NFET of the nal stage,
when fully turned on should be approximately equal to the impedance of the line that the line
driver is driving. This is so that any reected signals see a matching termination when they arrive
back at the line driver.
141
W1 W2 W3
Ground
W4
Pin
Vdd
Figure 12.8: Output driver circuit.
12.1.6 Line terminations
Obviously we dont want any reected waves. The goal of modifying the input load is simple.
We already have some input impedance and we want to add either a series resistor or inductor,
or a shunt resistor to make the input appear to be 50 . The way to decide if you need a series
or shunt resistor is to look at the impedance at the pin looking into the chip. If that impedance
is greater than 50 then you need a shunt resistor. If it is less than 50 then you need a series
resistor. When calculating the impedance looking into the chip input, simply short power supplies
because they are assumed to have zero internal resistance. In the gure 12.9 and gure 12.10 the
capacitances C1 and C2 inside the dotted line are the reverse biased diode capacitance which is
actually an active capacitance, not really passive because as either diode moves toward turn-on,
its capacitance increases almost as a square law dependency. The capacitances C3 and C4 are the
gate capacitances of the PFET and the NFET to source. Again C3 and C4 are active capacitances
because they depend on the level of inversion. The inductance L1 is the self-inductance of the
ne bond wire that is used to connect the pin to the bond pad on the chip itself. In reality as the
chips are getting smaller mutual inductance between the bond wires also becomes important. To
combat this chips that are used in certain types of applications use a special tape that is a signal
ground transmission line pair at 50 in place of the bond wire.
Pin
Vdd
Ground
C2
C1 C3
C4
L1
From transmission line
Rt
Figure 12.9: Series terminated chip input.
In gure 12.9 the series termination resistor is equal to the difference of the transmission line
impedance and the input impedance at the chip input i.e.
142
Rt = 50 [(jL
j
(C1 +C2 +C3 +C4)
)[
Pin
Vdd
Ground
C2
C1 C3
C4
L1
Rt
From transmission line
Figure 12.10: Shunt terminated chip input.
In gure 12.10 the shunt termination resistor is obtained such that the parallel combination
of the shunt resistor and the input impedance at the chip input is equal to the impedance of the
transmission line. So:
Rt =
50[(jL
j
(C1+C2+C3+C4)
)[
[(jL
j
(C1+C2+C3+C4)
)[ 50

Impedance unlike resistance is a function of frequency and in the two equations for Rt we
substituted which is 2f and we need to decide what f to use. Now, the fourier transform of
a square wave contains appreciable energy at fundamental frequency f, at 3f and at 5f, in that
order. Board designers use a quantity called the knee frequency which is the 3f. So if the signal is
a 10 MHz signal, the knee frequency is 30 MHz. So we just use the 3f frequency to calculate the Rt
required and use it in the correct conguration and what this should do is to match the impedance
looking into the chip with the impedance of the transmission line and so there should be much
less reection of energy.
12.1.7 Impedance variation
One fact that all designer whether chip designers or PCB designers have to live with is that
all process has built-in variation. So, even if you design the perfect PCB, you will always have a
signicant percentage of boards where there is a difference between the line impedance and the
impedance of the driver circuit (including its termination if any) and between the line impedance
and the impedance of the receiver input (including its termination if any). So what is the effect of
this mismatch on the signals that travel along the transmission line connecting these mismatched
chip to transmission line pairs ? Well normally this effect is easy to determine, just apply a high
impedance probe to either end and you should see a pattern as shown in gure 12.11. It is
normally called a staircase pattern. The probe you use could affect the mismatch but the thing
to look for is the spacing in time between the steps in gure 12.11. This spacing is twice the
transit time between the two points of reectance, so you can tell where the other mismatch is.
Note however that even when you determine the cause of the mismatch it is very bad policy to
try to x it on that particular board, such decisions should only be made to the design as a whole
i.e. to affect a change to all boards of that design.
143
Time
Figure 12.11: Staircase patterns on PCB lines due to impedance mismatches.
12.1.8 Cross coupling
In gure 12.12 we have a chip on the left sending a set of signals to the chip on the right.
For example maybe the chip on the left is a DRAM and it is sending a byte of information. So
assume that line 1 and line 2 are transmitting 0 and line 3 are transmitting 1 and line 4 and line
5 are transmitting 0. Normally even if there is a certain amount of cross coupling between the
lines, there is no problem. The problem is statistical in nature. Companies that build boards
build them in large volumes and they use parts from different parts bins stocked with chips from
different manufacturers. So for every few hundred boards you build a situation will arise where
the receiver chip i.e. the one on the right in gure 12.12 contains PFETs on the fast corner and
NFETs on the slow corner.
What this means is that this receiving chip will struggle to recognize 0s. And of every few
hundred boards built some will have also have senders that have PFETs that are on the fast corner
and NFETs that are on the slow corner. Such boards may fail stringent testing but may pass a light
standard of testing where only a small number of tests are run. In such boards, when the line
3 pulls high, it will cause cross coupling effects that try to cause lines 2 and 4 to pull up a little.
Again we should note that normally this would not be a problem, but in the specic cases we
talked about with weak NFETs and strong PFETs, this cross coupling will try to ght the senders
pull-down current and will act to try to confuse the receiver and will succeed occasionally and this
again is really a design issue i.e., instead of using only the chip specs on the sender and receiver
chips, if the designer also adds a little margin in the calculation to account for the effect of cross-
coupling, then the problem goes away. Keep in mind that with prot margins as slim as they are,
trying to enforce quality control through testing and binning only will invariably fail because no
manufacturer can afford really stringent testing.
1
2
4
5
3
Figure 12.12: Cross coupling between PCB traces.
144
12.1.9 Antenna effect
The worst thing you can have in a PCB design is a trace that is connected to something on one
end but is not connected to anything on the other end. If this kind of trace is long enough, it can
pick up electromagnetic energy just like an antenna. So this means that the voltage on the traces
connected to this stub will uctuate and this uctuation can cause a high to appear as a low or a
low to appear as a high. Usually this kind of trace occurs when a board designer changer his/her
mind about where to place some component and they forget to remove the old trace.
12.1.10 Ground bounce
Ground bounce is a phenomenon that has killed many a good design and left the designers
and testers scratching their heads alike wondering what the heck is going on. Usually in situations
like this the designers are frantically looking up the specs for the chips they used trying to see if
the current drive was sufcient to drive the length of the transmission line used. But the answer
with ground bounce has nothing at all to do with the chip specs and has everything to do with
the PCB as shown in gure 12.13. It looks normal, you think the supply and ground lines are a
little long, but the trace resistances are low so no big deal but the picture becomes clearer when
you look at gure 12.14, and this is what circuit sees when the output of the chip turns on and is
trying to drive a large current.
Long ground line
Chip
Small output current
Figure 12.13: Circuit that misbehaves due to ground bounce.
Chip
Small output current
Supply voltage V1
Lg Current
Current
Figure 12.14: The effective circuit seen by the chip.
The long ground line is actually an inductor due to its self-inductance but normally you wont
see this inductance Lg because the chip is drawing a steady current and as long as the current
stays approximately the same, there is no voltage drop across Lg but the instant the output driver
is turned on the current required by the chip from the power supply increases quite a bit, and
the the current through Lg cannot change instantaneously, so a voltage develops across Lg and so
145
the actual voltage V1 seen by the chip is less than the Supply voltage by this amount of voltage
that is dropped across Lg. But since the effective supply voltage V1 is lower than normal, the
output drivers cannot drive a sufcient current. This phenomenon of effective reduction of supply
voltage V1 is called ground bounce because the ground appears to bounce whenever the output
is turned on. The solution to ground bounce is quite simple and is shown in gure 12.15, all that
needs to be done is to buffer the supply with a small capacitor. Remember that the voltage across a
capacitor cannot change instantaneously, so when the output drivers turn on, the surge in supply
current that is required by the chip is supplied by the buffer capacitor and once the drivers turn
off, this energy is replenished from the supply line. So effectively the buffer capacitor isolates the
chip from the ground inductance and so the circuit works ne.
Chip
Lg Current
Current
Supply voltage
Normal output current
Figure 12.15: The solution to ground bounce.
12.1.11 Ringing
Ringing is not really a technical term, rather just a colloquial term to describe an effect that
is very commonly encountered when transferring a signal from a source to a destination via a
transmission line. Its effect can cause a perfectly normal signal to be read incorrectly. In the gure
12.16 you see a driving circuit i.e. it is sending the signal and the receiver circuit which has a load
capacitance. If the resistance of the transmission line is not sufcient to damp the driving current
it will cause overshoots and undershoots at the load capacitance at the input of the receiver as
shown in the gure 12.17. This type of circuit is called under-damped.
Vout
Vin
Figure 12.16: Driving point inductance and load capacitance.
What happens is that as the current charges the capacitor, it is storing energy in the inductor
and even when the capacitor is fully charged the inductance forces the current to continue ow-
ing as the energy in the inductor dissipates and this extra current ow causes the voltage at the
capacitor to go beyond what it otherwise would. The problem as seen in the gure 12.17 is that
the oscillation as energy is swapped back and forth between the inductor and the capacitor is that
146
Hi
Lo
Figure 12.17: Overshoot (left) and Undershoot (right).
the voltage at the Vout crosses the Hi and the Lo signal levels more than once and so it confuses
the receiver and whether the correct logic level of 1 or 0 is read really depends on when the signal
at Vout is sampled by the receiving circuit. These ripples are called ringing. Proper termination
will remove this problem.
12.2 Spread spectrum technology
Believe it or not PCBs generate a certain amount of radio energy that is regulated by the FCC.
The reason is that clock speeds and therefore signal speeds are getting so high that the overtones
approach the RF frequencies used. For the most part these high frequencies are generated due to
the
dV
dt
and the
dI
dt
at the signal edges. As shown in gure 12.18 if the sudden surge of current at
the onset of the signal going high or going low is reduced, the amount of high frequency energy
generated is reduced. But SST is more than just that. In SST the shape of the signal edge is varied
even between different pins of the same chip especially if there are many signals going in and
out. The reason is that even if you slow down these current surges, the high frequency energy
generated from the signals of different pins may all have their energy at the same frequencies
and so they start to add up, but if you vary the rate of current change to be different for different
signals from the chip, then their high frequency energies dont add up and are spread over the
high frequency spectrum and hence the name spread spectrum technology. So the name of the
game in SST is to vary the output drivers current so that the rate of current change is more varied
across the outputs of each chip.
Less high frequency energy
Lots of high frequency energy
Figure 12.18: Reducing the rate of voltage and current change.
12.3 Input/Output or IO circuits
This chapter would not be complete without a discussion of an IO circuit. In gure 12.19 is
shown a simple Input/Output or IO circuit. It is called an IO circuit because it can either read-in
147
from the pin or it can output to the pin. In fact is is usually a tri-state device because it can be set
to a third state where-in it neither reads-in or writes-out but instead has a high impedance looking
into the chip. The table in Table 12.1 shows the values of C1 and C2 required to control the state
of the IO circuit.
C1 C1b C2 C2b State
0 1 1 0 Output
1 0 0 1 Input
1 0 1 0 High impedance
Table 12.1: States of the IO circuit.
PG2A PG2
PG1 PG1A
Ground
Pin
Vdd
W1 W2 W3 W4
C1
C1b
C2b
C2
C1
C1b
C2b
C2
S1
S2 S4
S5
S6
S3
Figure 12.19: A simple IO circuit.
In gure 12.19, the boxes marked PG1, PG1A, PG2 and PG2A are called pass-gates. and are
discussed below, but for the IO circuit itself, when the signal pair C1 and C1a turn on PG1 and
PG1a and when at the same time the signal pair C2 and C2a turn off PG2 and PG2a, the effect is
to allow the signal inside the chip to propagate out to the pin through the output driver circuit.
But when the signal pair C1 and C1a turn off PG1 and PG1a and when at the same time the signal
pair C2 and C2a turn on PG2 and PG2a, the effect is to cause the signal at the pin to propagate in
through the input buffer, and proceed into the chip i.e. the signal is an input. If you turn off PG1,
PG1A, PG2 and PG2A all at the same time, then the pin is isolated from the signal inside the chip
and signals cannot propagate either in or out and the isolation is high impedance. The table 12.2
shows the purpose of the four pass gates used by gure 12.19.
148
Mode pass-gate Bad effects if permanently on or removed:
Input PG1 The driver inverter chain will amplify the signal on S1 and so S4 will
move move up and down causing noise on S6. Alot of power is wasted.
Input PG1A Since the output driver chain is so strong, the signal it is driving at S4
will completely overcome S6, so inputs will never make it into the chip.
Output PG2 If this pass-gate is removed it will create some noise because the buffer
will follow S6.
Output PG2A Similar to removing PG1A, removing PG2A will cause the signal to
become clamped. Here signal S1 gets clamped to the value of S3 i.e. the
signal driven by the input buffer.
Table 12.2: The purpose of each pass-gate.
149
Chapter 13
Automatic Test Equipment
Only a small fraction of all chips are tested. Typically a lot contains 25 wafers. Each wafer
contains 10 - 20 chips. So if one chip per lot is tested, that would be normal. But a high end chip
such as a microprocessor may have 400 pins. So the number of possible input combinations is
incredibly large. So again it is not possible to test all the chips functionality.
DUT
Board
Pin Driver
boards
Tester boards
Computer
Figure 13.1: An ATE setup.
Atypical ATE setup is shown in the gure 13.1. The DUT is mounted on the DUT board which
is mounted over the pin driver boards, and this whole unit is called the test head. The test head is
connected to a cabinet full of tester boards by the use of 50 or 75 cables. There is a lot of data
being transmitted from the test head to the tester boards and vice versa.
Suppose there are 100 pin driver boards in a test head. Assume each board is collecting infor-
mation for four pins of the DUT, with each cycle divided into 5 slices. Assume that the DUT is
being tested at 4 GHz. This means the test head is collecting 100 4 5 4 10
9
= 8 10
12
pieces of information per second.
Because such a large amount of information needs to be processed, you cannot connect the
test head directly to the computer and you need a cabinet full of tester boards to interact with the
test head. As a general rule the chips used tend to get cheaper as you move away from the DUT
toward the computer.
150
13.1 DUT board
The Device Under Test is mounted on a board which is usually circular as shown in the gure
13.2. The DUT is placed in a chip socket in the center of the board and the connections to the
pins radiate outward to the edge of the board. The socket used is usually specially created for
each chip and is designed for good contact without the need for solder. Often there may be a chip
cover which exerts force on the chip pressing it against the connections in the socket.
Figure 13.2: The DUT board of a tester.
The DUT board needs to be connected to as many as a hundred pin driver boards. These pin
driver boards may need to be changed occasionally. So at the edge of the DUT board are coaxial
connectors as shown in the gure 13.3. So when the DUT board is placed over the 400 connectors
from the pin driver boards and pressed down, the two connectors mate and a connection is made.
Dut board connection
Tester board connection
Figure 13.3: The pin driver boards connecting to the DUT board.
From the connector on the pin driver board to the inside of the chip, the equivalent circuit may
appear as shown in the gure 13.4. Ideally you want the PCB trace on the DUT board to be as
short as possible.
151
50 ohms 50 ohms
Parasitics from connectors
10 ps
Figure 13.4: Trace from pin driver board.
13.2 Main computer
The tester boards, the pin driver boards and the DUT board perform specic logic and they
do not use microprocessors. So a computer is the starting point to convert the test requirements
into the instructions sent to the testing boards. The computer does this part. It does not actually
participate in the testing but it directs it.
From the point of view of the tester the behavior of the DUTs pin is usually looked at from an
event driven stand-point. The reason is that when a pin does malfunction its behavior is really
complex, so the tests that are run usually pinpoint specic questions to be answered. So the testing
is split up on two fronts, one is time i.e. when a test is run and the other is IO behavior i.e. what
test is run.
T1 T2 T3 T4 T5 T1
Figure 13.5: How a cycle is divided.
In the example in gure 13.5 the output pin is shown to go high for some period of time and
then goes back to low. The cycle is divided into 5 time-slices as shown in gure 13.5, with the
cycle starting at T1 and ending at the next T1. The interval from T1 to T2 is the time that you
expect that the output will not change i.e. T2 is the earliest that the chip spec species that the
output can change. Between T2 and T3 you do expect the output to go high. Between T3 and T4
you expect the output to stay high. Between T4 and T5 you expect the output to pull to a low and
from T5 to the next T1 you again expect no change to the output. This is shown in the table 13.1.
time-slices start-time end-time action at DUT pin
1 T1 T2 high impedance
2 T2 T3 0 1 transition
3 T3 T4 hold at 1
4 T4 T5 1 0 transition
5 T5 T1 high impedance
Table 13.1: Setting up a test.
152
13.3 Tester boards
The main function of the tester boards is to isolate the controlling computer which can only
handle a small amount of data from the test head which is generating large amounts of data.
The computer used is usually a fast 64-bit workstation. In this computer there may be placed
one or more cards with a dedicated mode of communication with the cabinet full of tester boards.
So the side of the tester board that interacts with the computer may have chips that do this com-
munication. Between this side of the tester board and the other side which communicates with
the pin driver boards, all the logical processing needs to be done.
The type of information that needs to be sent to the pin driver boards must include the timing
information for each edge, the voltage that needs to be applied at each timing edge, whether the
pin is expected to be an input or an output, the drive current if it is an input, the load impedance
if it is an output, the result i.e. whether the pin passed or failed and a failure code if applicable.
13.4 Pin driver boards
Pin Driver
(Analog)
Instructions
15 ps
@ 50
DUT pin
Failure data
T1
T3
T4
T5
T2 Logic
chip
n
Main Sync
Figure 13.6: Tester circuit leading to DUT pin.
The logic chips as shown in the gure 13.6 perform the interaction between the test head and
the tester boards and they output the binary bits that are used by the timing chips. They also
output the control bits for the pin driver chip i.e. its operating mode and everything except the
timing edges.
13.4.1 Timing generation chip
The timing chip is the one in the center of the gure 13.6. The timing generation is divided
into two cascaded sections, the coarse delay and the ne delay. Suppose you wish the tester to be
able to test circuits at frequencies of 1 GHz and above. Then each of the timing edges maybe may
be anywhere from 0 to 1 ns from the start of the cycle.
Suppose you need the timing accuracy to be 1 ps. This will require a 10 bit timing instruction.
Then the coarse delay will be designed to output edges at 0 ps, 128 ps, 256 ps, 384 ps, 512 ps, 640
ps, 768 ps and 896 ps. The ne delay will be capable of outputting delays of 0 - 127 ps in units of
1 ps.
153
Vdd
T R
DAC 0 7
+

Figure 13.7: The ne delay.


The coarse delay is generated by a Delay Locked Loop as shown in the gure 10.15. Essentially
it divides a 976.56 MHz clock cycle into 8 parts. The selection of which edge to be used is the top
three bits i.e. bits 8, 9 and 10.
T
100 ps
DAC
voltage
Figure 13.8: The actual ramp and the cross-over point.
The ne part of the delay on the other hand is generated differently. It is purely analog in
nature. It is usually based on what is known as a ramp circuit. Ramp circuits are derived from
a larger family of circuits known as multi-vibrators. Multi-vibrators date back to when digital
circuits were just being born. They are divided in Bi-stable and Astable families. The rst of the
two is stable in either of two states while the latter is stable in neither state and constantly switches
states giving rise to a square wave output. It is the Bi-stable family that we will look at as shown
in the gure 13.7.
So the way that the ne delay works is that you rst apply the reset signal R. This drives the
output low and the circuit is ready to be triggered. Meanwhile you apply the lower 7 bits of the
delay instruction to the input of the DAC and so the output of the DAC is the reference voltage
at which the delay is set. Now the output signal from the coarse circuit is connected to the trigger
input T of the ne circuit so when the coarse signal arrives it triggers the ne delay and the output
of the ramp starts to rise as shown in the gure 13.8. When the ramp voltage becomes higher
than the voltage from the output of the DAC the operational amplier which is comparing the
two voltages changes states and the output of the operational amplier goes from low to high.
So as you can see by working together the coarse circuit and the ne circuit are able to convert
a 10-bit instruction into a delay of 0 to 1024 ps and they do this perhaps 4 billion times per sec-
154
D
e
l
a
y

o
b
t
a
i
n
e
d
Increasing digital input
obtained
expected
INL
DNL
0
0
Figure 13.9: Integral and differential non-linearity.
ond. The two metrics that are usually used to determine the quality of the timing chips are INL
and DNL meaning Integral non-linearity and Differential non-linearity. We talked about how the
timing chip generates a delay based on the digital input you supply it. So let us suppose that we
supply a sequence from zero to the maximum digital value allowed and plot the delay that we get
as shown in the gure 13.9.
In this gure assume that each step is supposed to be 4 ps. At zero, you were supposed to get
a zero delay but instead you got 1 ps. So the DNL here is 1 ps. The INL is also 1 ps. At the next
step you were supposed to get 4 ps, but instead you got 6 ps. The DNL is again 1 ps, but the INL
is now 2 ps. So as you can see the INL is the integral of the timing error upto that point. whereas
the DNL is the error for each step. Notice that the INL in this gure increases upto 3 ps and then
reduces back to zero. As a general rule people are more interested in the INL because it actually
represents the error you are going to get in your testing. Ideally both INL and DNL should be zero
but this rarely happens.
13.4.2 Pin driver chip
The pin driver chip is more like a linear amplier than an IO circuit. Its block diagram looks
as shown in the gure 13.10. The timing signals from the timing chips come in as differential
signals. The bits setting the drive current and the output voltage come in as digital signals from
the logic chip at the beginning of the pin driver boards.
The output driver is usually a push-pull stage. The pin driver chip will require a separate V
cc
and V
ee
to allow the fast slew rate all the way down to 0 V and up to the supply voltage. For
timing accuracy as well as drive current the pin driver chips are usually bipolar.
The bits that control the output voltage are used to control the upper and lower supply voltages
that the push-pull stage actually sees and the bits that control the slew rate simply set the drive
current of the current sources driving the push-pull stage so then the differential timing signals
switch the push-pull stage fromone set of control bits to the next. Apin driver used in the industry
is described in [112].
155
PushPull
Drive current 1 Output voltage 1
Drive current 1 Output voltage 1
Drive current 1 Output voltage 1
T1
T2
T3
Figure 13.10: A pin driver block diagram.
156
Chapter 14
MMICs
Monolithic microwave integrated circuits are usually expensive to build because on the one
hand the devices need to extremely fast meaning the feature size has to be as small as possible but
on the other hand waveguides are fairly large and occupy a large area. For this reason most RF
circuits are built using discrete elements placed on a high quality printed circuit board which also
holds the waveguides. But as the frequencies rise and with the increase in the digital portions of
the circuit more and more functionality can be placed on a single chip and reducing the complexity
of the assembly ofsets the increased cost of the chip so MMICs are becoming larger.
14.1 Lumped and distributed elements
By physical reality all elements are distributed elements. But at low frequencies many devices
can be modeled as ideal elements such resistance, inductance and capacitance or combinations of
them with almost no loss in accuracy. Then you can call them lumped elements.
At high frequencies however most devices show their distributed properties. Usually it is just
a case of parasitics displaying themselves for example a 1F capacitor at 1 kHz may appear as
an RC network at 1 GHz. Bond wires connecting a chip pad to the chip pin exhibit their self
inductance as the frequency is raised.
Sometimes parasitics have even more impact for example two bond wires which are shorts at
1 KHz exhibit their mutual inductance at 1 GHz and I/O energy may be coupled from one pin
into another pin to which it is not even physically connected.
But, in this chapter we are not looking at the effect of parasitics. We are looking at structures
which are designed to be operated at microwave frequencies. Structures whose physical dimen-
sions are only several wavelengths at the operating frequency. When analyzing lumped circuits
we were interested in current ow as a function of time, but here we are interested in the propa-
gation of electromagnetic waves in microwave structures.
14.2 Maxwells equations
The four Maxwells equations are:
H = J +
D
t
(14.1)
E =
B
t
(14.2)
157
D = (14.3)
B = 0 (14.4)
Equation 14.1 is Amperes law, equation 14.2 is Faradays law and equation 14.3 is Gausss
law.
14.3 Transmission lines
The equation for the impedance looking into a transmission line of characteristic impedance
Z
0
of length l and terminated by a load of Z
L
is given by
Z = Z
0
Z
L
+Z
0
tanh( +jl)
Z
0
+Z
L
tanh( +jl)
(14.5)
is the loss coefcient, so that e
l
gives the fraction transmitted and is the phase constant,
so that e
jl
will give the phase change over a distance of l. is usually negative but in the eld of
ber optics, there are special sections of optical cable which amplify the signal as it goes through
them which is a better alternative to using repeaters.
The gure 14.1 shows a junction between two transmission lines of different characteristic
impedance Z
1
and Z
2
. The reection coefcient for a wave on Z
1
incident on the junction is given
by the equation 14.6. The transmission coefcient on Z
2
is given by the equation 14.7. T and
are for voltage so from power conservation you get the equation 14.8.
Z1
Z2
Reflected
Incident
Transmitted
Figure 14.1: Reection and transmission at a junction.
=
Z
2
Z
1
Z
2
+Z
1
(14.6)
T = 1 + =
2Z
2
Z
2
+Z
1
(14.7)
[ [
2
+

Z
1
Z
2

2
= 1 (14.8)
Nowadays the design of integrated waveguides is usually done using a partial differential
equation (pde) solver. The reason is partly that computers have become some much more acces-
sible, but also because the analytical solutions often require certain simplifying assumptions. But
a pde solver can solve any structure with equal accuracy and the accuracy is often higher than
an analytical approach can obtain. In a numerical solution you can also apply local variation in
parameters such as for example using a dielectric whose permittivity varies. But equations such
158
as [113] were done at a time when computers were not easily available and so they are very im-
portant.
The gure 14.2 shows a microstrip line. It is comprised of a signal line of width W and
thickness t separated from a ground plane by a dielectric layer of height h. Most of the electric
eld goes through the dielectric under the signal line but some eld lines go through the air so
the permittivity is an effective value between air and the dielectric material. The equation for the
microstrip line characteristic impedance is given by the equations 14.9, 14.10, 14.11 from [114].
W
h
=
8

X
11
(7 +
4
r
) +
1
0.81
(1 +
1
r
)
X
(14.9)
X = exp

Z
0

r
+ 1
42.4

1 (14.10)
Z
0
=
42.4

r
+ 1
ln

1 +

4h
W

14 + 8/
r
11

4h
W

14 + 8/
r
11

4h
W

2
+
1 + 1/
r
2

2

(14.11)
Signal
Ground
t
h
W
Figure 14.2: A microstrip line.
The gure 14.3 shows a stripline. It is different from a microstrip because it has two ground
planes instead of one and the eld is contained in the dielectric. The characteristic impedance is
given by the equation 14.12 from [113], [115].
Z
0
=
30

r
ln

1 +

8h
W

16h
W

16h
W

2
+ 6.27

(14.12)
The gure 14.4 shows a slotline. It is comprised of a slot of width W in a single metalization
layer over a dielectric of height h and there is no ground plane on the other side of the dielectric.
The electric eld between the metal on either side of the slot has three paths to follow. Through the
159
Figure 14.3: A stripline line.
dielectric, through the air of the slot, and through the air above the metalization. The characteristic
impedance is given in [116] and [117] however the closed form solution is obtained in a piecewise
manner and so there is a different equation for different ranges of W/h and
r
.
h
W
t
Figure 14.4: A slotline.
The gure 14.5 shows a coplanar waveguide. It has a center strip which carries the signal and
ground planes on either side all of which is placed above a dielectric of height h.
The gure 14.6 shows a coplanar strip. It has two strips which carry the signal and ground
and are placed on a dielectric of height h.
14.4 N-port circuits
The h, y and z parameters are based on the gure 14.7. The y parameters are based on admit-
tance because they all have the units of I/V , the z parameters have the units of impedance, the h
parameters are transfer parameters.
160
h
t
W
Figure 14.5: A coplanar waveguide.
h
W
t
Figure 14.6: A coplanar strip.
I1
I2
V1
+

+
V2

Figure 14.7: Two port network.


161
14.4.1 h-Parameters, y-Parameters & z-Parameters
V
1
= h
11
I
1
+ h
12
V
2
(14.13)
I
2
= h
21
I
1
+ h
22
V
2
(14.14)
So if you wish to obtain h
11
you can short the port 2, so V
2
= 0. Similarly to obtain h
12
measure
the voltage V
1
with the input open-circuited. Similarly you repeat these two measurements with
the ports reversed to get h
22
and h
21
. However, you can extract the h parameters even without
either short circuiting or open circuiting either port by simply tting to multiple data points. The
y-parameters are given by equations 14.15 and 14.16 and the z-parameters are given by equations
14.17 and 14.18.
I
1
= y
11
V
1
+ y
12
V
2
(14.15)
I
2
= y
21
V
1
+ y
22
V
2
(14.16)
V
1
= z
11
I
1
+ z
12
I
2
(14.17)
V
2
= z
21
I
1
+ z
22
I
2
(14.18)
14.4.2 S-Parameters
a1
a2
a3
b3
b2
b1
Figure 14.8: S-parameter measurement.
Unlike the h, y and z parameters, the S parameters do not require the use of short circuits or
open circuits. Short circuits may not be possible to implement for certain circuits because it may
cause the circuit to become unstable.
The s parameters are thought of more in terms of power transmission and reection as in the
gure 14.8. At high frequencies you can use devices such as circulators to separate input from
output and so it more convenient to apply a sinusoidal radio frequency input and measure the
162
amplitude and phase of the signal going in and coming out at either port. The signal coming out
could be either just reected back or transmitted through from the other side.

b
1
b
2
b
3

S
11
S
12
S
13
S
21
S
22
S
23
S
31
S
32
S
33

a
1
a
2
a
3

(14.19)
14.5 Balun
3L/4
1 2
3
4
L/4
L/4
L/4
Figure 14.9: A rat race coupler.
A rat race coupler and a balun both use quarter wavelength sections. But a balun [118] has
only one input and one output and is used to isolate a balanced impedance from an unbalanced
impedance which is where the word balun comes from. One example is isolating an antenna from
the receiver connected to it.
A rat race coupler is shown in the gure 14.9. If the impedance of the line providing the input
is Z
0
then the impedance of the coupler should be Z
0

2. There are 3 quarter wavelength sections


between ports 1 and 2 and a quarter wavelength between 1 and 4, between 2 and 3 and between 3
and 4. The input at 1 is output at 2 and 4 but not at 3. The output at 1 and 4 are 180
o
out of phase.
A balun is shown in the gure 14.10. It is a modication of the rat race coupler. Its advantage
is that it has a larger bandwidth than the rat race coupler. The starting values used for t
1
and t
2
are /4 and /2 but they can be adjusted to improve the bandwidth. The port 3 of the rat race is
open circuited. The input is at 1 and the outputs are 2 and 4.
14.6 Circulators
A circulator [119] uses a ferromagnetic material to give a different impedance for a signal
traveling in one direction from a signal traveling in the opposite direction. The structure of a
circulator is shown in the gure 14.11.
It is a T junction except that above and below the T are placed magnetized ferrite slabs of a
suitable magnetic orientation. The magnetic eld due to the ferrite slabs is directed into the face
of the junction. For the input mode where the magnetic eld vector lies in the plane of the ferrite
slabs, a resonance is set up [119] such that the standing wave pattern within the center disk is
163
3L/4
L/4
1
2
4
t1
t1
t2
Figure 14.10: A balun.
Ferrite
disks
Conductor
H
1
2
3
Figure 14.11: The structure of a circulator.
164
rotated by 30
o
at the design frequency so that a signal entering at 1 is output at 2 but is null at 3.
Similarly a signal entering at 2 is output at 3 and is null at 1.
Circulators are very useful in isolating signals going into a port from the signals coming out of
the port. So if port 2 is connected to the input of a following section, port 1 can be connected to a
signal source and port 3 can be connected to a detector to measure the signal reected back from
the input of the following section.
14.7 Impedance transformers and lters

a
=
Z
2
Z
1
Z
1
+Z
2
(14.20)
An impedance transformer allows a signal on a line of impedance Z
1
to be transferred onto a
line of impedance Z
2
with minimum reection. Without an impedance transformer the reection
coefcien is
a
. It is asymmetric and the key is that looking into the left of the transformer the
input appears to be of impedance Z
1
whereas looking into the right of the transformer the input
appears to be of impedance Z
2
. Of course the catch is that it only does this at a given frequency
and at higher or lower frequencies there is plenty of reection as you move further away from the
design frequency.
L
Z1
Z2
Figure 14.12: A tapered line.
Often a tapered transmission line can be used to connect two transmission lines of impedance
Z
1
and Z
2
. There are many options such as a linear change, an exponential change etc, the most
popular is the Klopfenstein taper [120], [121] shown in the gure 14.12. The design equations are
14.21, 14.22 and 14.23.
A = acosh

req

(14.21)
(x, A) =

x
0
I
1
(A

1 y
2
)
A

1 y
2
dy (14.22)
ln

Z(x)
Z
1

=
1
2
ln

Z
2
Z
1

+
req
A
2

2x
L
, A

(14.23)
The simplest and most used impedance transformer is the quarter wavelength transformer as
shown in the gure 14.13 [122]. The design is simple because the impedances of the sections
either increase or decrease monotonically and they are selected such that if you have n steps the
at each step is approximately the same at all the n +1 steps. But if you are designing a lter using
quarter wavelength sections it will appear more as in the gure 14.14.
165
Figure 14.13: A quarter wavelength impedance transformer.
Figure 14.14: A sample line and stub lter.
The approach of using quarter wavelength transmission line sections like lumped elements in
designing a lter was proposed by [123] using the complex frequency variable S such that a high
pass lumped lter response in in the frequency range transforms to the range

0

0
repeating every 2
0
.
S = j = j

2
0
(14.24)
A quarter wavelength section with an open circuit for load is a capacitance whereas if the load
is a short circuit it would be an inductance. As in the gure 14.14, by using shorted and open
sections in series and in parallel, you can construct a lter with your desired frequency response
[124], [125].
Initially the polynomial representation of the lter function needs to be obtained and then the
elements need to be extracted. Each quarter wavelength section is represented by the matrix in
the equation 14.25. There are three type of elements used namely the shorted and open quarter
wavelength sections, the unit elements which are quarter wavelength sections connected at both
ends and the redundant elements which are used to physically separate the shorted and open
sections.

b
1
a
1

=
1
s
21


s
s
11
s
22
1

a
2
b
2

(14.25)
If you use n u.e.s and m distributed Ls and Cs in the lter, then you can use a standard
representation such as a Butterworth or Chebyshev representation so for example the ratio of
reectance to transmittance for a Butterworth high pass would be represented as in the equation
14.26 or a Chebyshev low pass would be represented as in the equation 14.27 where S
c
is the
cutoff frequency and T
m
(x) = cos(m acos x) and U
m
(x) = sin(m acos x).
[[
2
[t[
2
=

S
c
S

2m

1 S
2
c
S
c

1 S
2

2n
(14.26)
[[
2
[t[
2
=
2

T
m

S
S
c

T
n

1 S
2
c
S
c

1 S
2

U
m

S
S
c

U
n

1 S
2
c
S
c

1 S
2

2
(14.27)
[[
2
+[t[
2
= 1 (14.28)
166
The elements are extracted fromthe Z
in
using Richards theorem[123], for example removing a
u.e. gives the remaining impedance as in the equation 14.29 and this process needs to be continued
until all elements are extracted. You may have to use the Kuroda identities which are listed in
[125].
Z

in
(S) = Z
in
(1)
SZ
in
(1) Z
in
(S)
SZ
in
(S) Z
in
(1)
(14.29)
Zl
L
Figure 14.15: A coupled line lter.
An alternative way to make a lter is to use coupled lines [126] as shown in the gure 14.15.
Here too the coupled sections are a quarter wavelength long at
0
. In either the microstrip or the
stripline there is a center signal line and the ground plane(s). If a second signal line is placed in
close proximity to the rst signal line, as much as 100% of the power can be transferred from the
rst to the second and back.
L
Figure 14.16: Each section of the coupled line lter.
167
Chapter 15
Transducers
Einstein received the Nobel prize for his work on the photoelectric effect and his equation
E = h where h is Plancks constant, E is the energy of the photon and is the frequency of the
light emitted.
15.1 Direct gap
Some materials absorb and emit light more easily than others because they are of a class called
direct gap semiconductors. If you draw the E-k plot for Si, you will nd that the lowest level in the
conduction band does not lie at the same wave number as the highest level in the valence band as
shown in the gure 15.1.
GaAs Silicon
A
C
B
B
A
Figure 15.1: The forbidden gap of silicon.
The lower band is the valence band and the upper band is the conduction band. The free
electrons occupy the lowest levels available in the conduction band and one says electrons sink
whereas holes occupy the highest available levels in the valence band because holes oat (because
the electrons in the valence band sink).
In the gure 15.1 the direct gap of Si is fromA to B and is larger than the indirect gap from A to
C. However in GaAs the smallest gap is the direct gap. In GaAs if a photon were to supply enough
energy, it could be absorbed [21] and a valence band electron could jump from A to B. Also, if you
pump electrons into the conduction band and ensure that there are holes in the valence band, in
GaAs the conduction band electron could jump to the valence band and release a photon of that
energy gap, but this is unlikely to happen in Si because if you have electrons at B they will simply
slide down the conduction band to C. For this reason it is difcult to get Si to emit light. On the
168
other hand if you supply photons of energy equal to the gap from A to B, then Si can absorb light
and for this reason there are plenty of manufacturers making light detectors in Si.
15.2 Semiconductor lasers
The acronym LASER stands for Light Amplication by Stimulated Emission of Radiation.
There are basically two types of semiconductor lasers, edge emitting and surface emitting. Edge
emitting came rst because it uses a much simpler structure.
15.2.1 Edge emitting
The structure of an edge emitting laser is that of a P-i-N diode as shown in the gure 15.2.
i
N
P

+
Figure 15.2: Edge emitting laser.
The P layer on top is connected to the positive of the battery and the N layer on the bottom
is connected to the negative of the battery. Holes enter from the top, electrons enter from the
bottom and in the intrinsic layer there are no dopant ions so the state of lowest energy would
be for all the holes to combine with all the electrons (due to current continuity they are the same
number) and emit the bandgap energy as photons. The two edges are created by simply cleaving
the semiconductor by striping it and breaking it and you will get a clean edge. The index of
refraction of GaAs is about 3.6 and that of air is 1 so the power reection from the cleaved surface
is

2
=

3.6 1
3.6 + 1

2
= 0.32 (15.1)
32% reection is not very large but it is sufcient to induce lasing if the diode is pumped
hard enough. All you need is that the gain in photons going from one cleaved surface to the
other cleaved surface multiplied by the reectance is greater than 1, which will ensure a positive
feedback loop and therefore lasing. So in this case the photon gain has to be more than 1/0.32 = 3.2
so the reected photon needs to induce the stimulated emission of 2.2 photons so that when they
reach the other cleaved surface 2.2 photons are transmitted out through the mirror and one photon
gets reected back. So you have the dependencies
E = h (15.2)
=
c
n
(15.3)
L = m (15.4)
G() e
gL
(15.5)
G
2
> 1 (15.6)
169
E = E
g
+ E (15.7)
The rst equation gives you the frequency, the second gives you the wavelength, the third
gives you the different lengths L you can choose from, the fourth gives you the photon gain G you
can expect from that length at that frequency, the fth just requires the feedback be greater than
1, the sixth is just the E that you need to factor in because when you pump the diode hard, the
average electron is higher than the bottom of the conduction band and the average hole is lower
than the top of the valence band so your frequency will be slightly higher than that given by the
band gap energy.
In order to output only a single frequency, you need to make sure that the loop gain is greater
than 1 only at a single frequency and this will get more difcult as the length Lis increased because
mgets larger and so you may have appreciable gain at a longer wavelength such that (m1) = L
or at a shorter wavelength such that (m + 1) = L. On the other hand if you make L too short,
you will have to pump the diode very hard indeed which creates other problems.
So to summarize, in order to make the laser function as you wish it to, you have to properly
estimate how hard you need to pump it, how narrow the intrinsic layer needs to be, how long the
laser has to be, and the operating temperature of the laser (because it causes expansion and has
other effects) and so on. In addition the way lasers are often used is to abut the edge of the laser
to the end of an optical ber and use an index matching glue to connect them, and since glass has
a refractive index of 1.5, this will almost certainly affect the lasing, because it drops the reectance
of the cleaved edge of the laser below 32%.
15.2.2 Surface emitting
Nowadays surface emitting lasers are much more popular than edge emitting lasers, you can
make an array of them on a single chip. Unlike edge emitting lasers Vertical Cavity Surface Emit-
ting Lasers or VCSELs do not use a cleaved edge as a mirror. Instead they use epitaxially grown
multi layer mirrors called Bragg reectors as shown in the gure 15.3.
Light out
P mirror
N mirror

+
Figure 15.3: Surface emitting laser.
We are accustomed to light reection by a glass mirror, but there is another kind of reection
that is more distributed and based on interference of the wave, given by Braggs law:

n
= 2 d sin() (15.8)
170
Here n is the effective refractive index of the Bragg structure, d is the spacing of the structure
i.e. the minimum repeatable unit of high and low refractive index. The gure 15.4 shows Bragg
reection of light.
Figure 15.4: Bragg reection.
The effective refractive index n is the algebraic mean over the distance traveled. So if the Bragg
structure is 40% at index 1.5 and 60% at index 2.25, then the effective index is 1.95. In addition
if the Bragg condition is met, there is reection, but the amount of light reected depends on the
abruptness and index difference between the layers and so different Bragg structures with the
same periodicity and the same effective index will reect different amounts of light. Ultimately if
the number of repeatable units in the Bragg structure is increased indenitely, all the light will be
reected.
In this type of laser the gain is obtained in the intrinsic layer sandwiched between the upper
p-type region and the lower n-type region in exactly the same way as in the edge emitting laser
diode. The difference is that the light reection is vertical not horizontal. A VCSEL has better
frequency selectivity than an edge emitting laser because here even the mirror helps in the fre-
quency selection via the Bragg condition. Unlike the case of the cleaved edge acting as a mirror,
in this case you can obtain the reectance you wish for by increasing the number of layers in the
Bragg stack. In addition, you can have more layers at the bottom so that you dont waste lasing
by reabsorption in the substrate.
In the early days people did everything they could to avoid passing the current through the
mirrors. But in this case it was difcult to ood the intrinsic region between the Bragg mirrors with
holes and electrons which are needed to cause photon gain. When they did pass current through
the mirrors they would heat up [127] and expand causing frequency distortion and excessive
power loss. Experimental studies [128], [129] showed that modulation doping could alleviate the
problem of thermionic barriers. My paper [37] showed how to reduce the thermionic barriers in
p-type and n-type Bragg structures to a theoretical minimum.
15.2.3 Bulk vs. distributed gain
The light emitted by a laser is said to be coherent meaning that the plane waves representing
the photons are in phase. Stimulated emission is in phase with the wave that stimulated it. At
the mirrors the phase has to be zero. It is sufcient to provide hole electron pairs at the points of
maximum amplitude of the waves because this is where most of the stimulated emission occurs.
In fact in an edge emitting laser as you move across the laser [130] you will nd regions where
the available hole electron pairs are depleted by the stimulated emission and the lasing is limited
171
by the diffusion of hole and electrons from surrounding areas into these regions, it is called "hole
burning".
If you are building a VCSEL, you can easily place the intrinsic slices where the wave has max-
imum amplitude and thereby increase the efciency of the lasing. This type of structure is called
a distributed gain structure.
15.3 Junction detectors
The opposite of emitting light is to detect light. The simplest detector structure is a reverse
biased diode [131] as shown in the gure 15.5.
P
N
Light input
Figure 15.5: Diode detector.
When you reverse bias a diode, the holes are pulled away from the junction toward the top
whereas the electrons are pulled away fromthe junction toward the bottomand you have a deple-
tion region where the junction used to be and there is a high eld in this depletion region because
all the voltage is dropped across it.
Nowif you shine light at this region, at an energy higher than the band gap, it will be absorbed
and will result in a hole electron pair which will immediately separate in the eld and registers
as a current ow which is proportional to the amount of light that is absorbed. Due to the Franz-
Keldysh effect [132], [133] the energy gap in the presence of a high electric eld is reduced and so
even photons of energy slightly lower than the E
g
may be absorbed as shown in the gure 15.6.
hv
Eg
Figure 15.6: The Franz-Keldysh effect.
When you reverse bias a diode the depletion width increases rapidly at rst but the rate of
increase of the depletion width falls off rapidly and if you reverse bias the junction too much you
will simply cause a reverse breakdown of the diode. For this reason the junctions in high quality
detectors are often designed with very thin interlaced ngers as shown in the gure 15.7.
Avalanche photodiodes [134] are different from a normal photodiode due to the high reverse
eld that they have to tolerate. Unless the diode is carefully constructed discontinuities in the
172
Figure 15.7: Interlaced junction increases area.
semiconductor or surface states can cause a breakdown before the avalanche electric eld is reached.
But if the electric eld is higher than that required to cause avalanche ionization, then the carri-
ers generated due to photon absorption accelerate in the eld and cause the generation of more
carriers, so there can be a gain of as much as a few thousand.
Phototransistors [135] are different because the eld is the normal eld but due to the transistor
structure the carriers generated in the base emitter junction will cause as much collector current
to ow.
15.4 Accelerometers
Accelerometers are based on the piezoresistive effect [136]. A typical structure used is shown
in the gure 15.8. There is usually a cavity created by etching and in that cavity is a mass of
any shape attached only on one end and free to ex within the cavity. It can be just a cantilever
and to avoid orientation problems it can be a combination of two or three cantilevers oriented
orthogonally to each other.
Figure 15.8: A typical accelerometer.
From Newtons rst law F = ma and so the torsion applied on the cantilevered section is
a linear function of the acceleration. Due to the piezoresistive effect, the resistance of any loop
which has paths going through the cantilever will change with the stress i.e. with the acceleration
and thus can be detected.
Or there are many other options such as the change in capacitance between a surface section
of the mass and the wall of the cavity, or perhaps optical effects such as interference. But the
piezoresistive effect is easiest to work with. For example a Wheatstone bridge conguration could
be used to detect imbalance due to the change in resistance.
173
Chapter 16
Technology CAD
Until about 1985 TCAD was not even possible. So devices were analyzed analytically and the
most elegant approximations were based on series expansion such as the Taylors and McLaurin
series etc. After numerical analysis became possible, it started with one-dimensional (1D) analysis,
then 2D analysis and 1D transient, then 3D etc. For most of the history of numerical analysis
computing power was a limited resource so parsimony in computing was a virtue.
To start learning numerical techniques [137] is easy to read. For probability theory [138] is a
good place to start and for a good collection of formulae [12], and for nite elements [139], and
if you are interested in C implementation then [140]. Semiconductor equations use exponentials
a lot and are said to be "stiff" meaning that small changes in one variable cause large changes in
another. For this reason nite differences are often more popular than nite elements.
Nowadays most engineers do not write partial differential equation solvers because there are
so many commercial packages available which can be incorporated into your simulation ow so
that you only provide the data and the parameters of solution and the solvers do the job better
than you could implement yourself. Device simulation often yields sparse matrices [141] and
there are commercial packages that can solve them. Or to do least squares tting that is often used
such as the Levenburg-Marquardt algorithm [142], [143]. But it is still worthwhile to understand
the basics of numerical solution of equations for the same reason we learn to do mathematics
manually even after the advent of pocket calculators.
16.1 Basic numerical techniques
n3 n2 n1 n n+1 n+2 n+3
0
Figure 16.1: Discrete data.
174
A lot of the difference equations are based on the Taylors series given by the equation 16.1.
f(x) = f(a) + f

(a)(x a) +
f

(a)(x a)
2
2!
+ . . . (16.1)
16.1.1 Differentiation
If you use only the rst two terms of equation 16.1 then you will get the rst derivative as
f

(a) =
f(x) f(a)
x a
(16.2)
If the spacing between the samples is x, then you can write two equations 16.3, 16.4 based on
the Taylors series and when you add them you get equation 16.5 which is the second derivative
in central difference form i.e. the second derivative at n based on points evenly spread before and
after n.
f
n+1
= f
n
+ f

n
x + f

n
x
2
2
(16.3)
f
n1
= f
n
f

n
x + f

n
x
2
2
(16.4)
f

n
=
f
n1
2f
n
+ f
n+1
x
2
(16.5)
So in this manner you can obtain nite difference representations for any level of derivative
either centered about n or using different combinations of points ahead or behind n. This can be
done even if the points are not evenly spaced as shown in the gure 16.2. So equations 16.3, 16.4
become equations 16.6, 16.7.
n1 n n+1
a b
Figure 16.2: Unevenly spaced points.
f
n+1
= f
n
+ f

n
b + f

n
b
2
2
(16.6)
f
n1
= f
n
f

n
a + f

n
a
2
2
(16.7)
Multiplying equation 16.6 by a/b and then adding it to equation 16.7 and you will get equation
16.8 which is the second derivative at n. The equations for the different nite differences are in
many textbooks such as [137].
f

n
=
f
n1
(1 + (a/b))f
n
+ (a/b)f
n+1
(b/2a) + (a
2
/2)
(16.8)
175
Figure 16.3: Simple integration.
16.1.2 Integration
The simplest integration is to use the value at n from midway to n-1 to midway to n+1 as
shown in the gure 16.3. The Simpsons rule is a better method given by equation 16.9 and the
2 point, 4 point and 5 point are in equations 16.10, 16.11, 16.12 [137]. There are many references
which explain why in general integration generates less error than differentiation.

x
2
x
0
f(x)dx =
x
3
(f
0
+ 4f
1
+f
2
) (16.9)

x
1
x
0
f(x)dx =
x
2
(f
0
+f
1
) (16.10)

x
3
x
0
f(x)dx =
3x
8
(f
0
+ 3f
1
+ 3f
2
+f
3
) (16.11)

x
4
x
0
f(x)dx =
2x
45
(7f
0
+ 32f
1
+ 12f
2
+ 32f
3
+ 7f
4
) (16.12)
16.1.3 Interpolation
Suppose you need to know the value of a function f at x for the one dimensional case or x, y
for the two dimensional case as shown in the gure 16.4. The equation you use has to be based
on the equation that describes f(x) and f(x, y). The reason is shown in the table 16.1.
f(x2) f(x) f(x2,y1)
f(x2,y2)
f(x,y)
f(x1) f(x1,y1)
f(x1,y2)
Figure 16.4: Interpolation to nd f(x) and f(x, y).
In the table 16.1 there are three functions shown namely the sine wave, the exponential and a
cubic. In all three cases the value of x is midway between x
1
and x
2
, but the value of f(x) is not
176
Equation x
1
x x
2
f(x
1
) f(x) f(x
2
)
Sine() 30
o
45
o
60
o
0.5 0.707 0.866
e
x
1 2 3 2.718 7.389 20.085
x
3
1 2 3 1 8 27
Table 16.1: Interpolation.
midway between f(x
1
) and f(x
2
). Normally if you need to interpolate you dont know what the
shape of the function is.
The Taylors series works regardless of what relationship creates the data. It is based on the
idea that you can t a polynomial dependence to the data and then the prediction is based on that
polynomial. So if you keep increasing the order of the interpolation scheme i.e. if you utilize the
higher order derivatives then you will get progressively better results.
In order to use the higher order derivatives you need to use more points and then you are
making the assumption that the relationship creating the data points is the same across those
points. The rst order or linear interpolation is given in equation 16.13 for one dimension. You
can nd tables of coefcients in [137] for many different types of interpolation such as Lagrange,
Stirling, Bessel, Everett, Steffensen and Newton.
f(x) =
(x
2
x)f(x
1
) (x
1
x)f(x
2
)
x
2
x
1
(16.13)
For the two dimensional case you use the two dimensional Taylors series as in equation 16.14.
Then substitute for the partial derivatives using the points (x
2
, y
1
) and (x
1
, y
2
). Then repeat using
(x
2
, y
2
) as the starting point and average. So then you have used all four points to estimate the
value at (x, y).
f(x, y) = f(x
1
, y
1
) + (x x
1
)
f
x

x
1
,y
1
+ (y y
1
)
f
y

x
1
,y
1
+ ldots (16.14)
16.2 Grid selection
The gure 16.5 shows a uniform grid. Here all the points are evenly spaced along the x and y
axis. If there is only one x and one y it has to be as small as the nest requirement along x and
y axis. But if it does not have to be uniform you can make the grid as in the gure 16.6.
x
y
Figure 16.5: A uniform rectangular grid.
177
Figure 16.6: A rectangular grid.
In the gure 16.7 is a different approach which is the adaptive grid [144]. You start by setting
up a coarse grid containing triangles, and then you rene it by dividing those triangles into sub
triangles [145] for example by connecting the mid points of the sides in areas where a high accu-
racy is needed as shown in the gure 16.8. When rening the grid you dont want to make the
grid too ne because otherwise you might get terms in the matrix which are almost zero.
x
y
Figure 16.7: A 2 dimensional adaptive grid.
Figure 16.8: Rening a grid.
The use of an adaptive grid can reduce the matrix size by a large factor. Suppose you start
with a 100 100 uniform rectangular grid. Now if you need a 10 ner grid for 6 grid spacings
along the x axis and 4 grid spacings along the y axis, then the size of the matrix increases to
154 136 = 20, 944 as opposed to 10,000. But in the case of an triangular adaptive grid you can
178
get an increased accuracy in a local area without affecting the entire column and entire row as in
the case of a rectangular grid. In the case of the 3D solution the difference between a rectangular
grid and an adaptive triangular grid is much larger than the 2D case.
16.3 Device simulation
2Ddevice simulation [144], [146] was rst done in the early 1980s using the computing facilities
available then. Computers are a lot faster nowadays so ner grids and even 3D simulations [147]
are feasible. To get the general principles, a 1D simulation is enough to understand, and the best
example is a compound semiconductor such as Al
x
Ga
1x
As [148].
The main equations to solve are the Poissons equation and the continuity equations for holes
and electrons. These are written as equations 16.15 and 16.17 for electrons and 16.16 and 16.18
for holes, where
e
is the electron afnity.
d
dx

d
dx

E
c
+
e
q

= q(N
+
d
n +p N

a
) (16.15)
d
dx

d
dx

E
v
+E
g
+
e
q

= q(N
+
d
n +p N

a
) (16.16)
dn
dt
=
1
q
dJ
n
dx
R +G (16.17)
dp
dt
=
1
q
dJ
p
dx
R +G (16.18)
As the fraction x varies so too will the material properties [36], [130] so for example
(x) = 13.18 3.12x (16.19)
E
g
(x) =

1.424 + 1.247x if x < 0.45


1.9 + 0.125x + 0.143x
2
if x > 0.45
(16.20)
E
g
(x) =

4.07 1.1x if x < 0.45


3.64 0.14x if x > 0.45
(16.21)
For the Fermi-Dirac statistics you can use the Joyce-Dixon approximation [16] of equations
16.22 and 16.23. You would need the N
c
and N
v
which are also available in [130]. The net electric
eld is simply (1/q)dE
f
/dx.
E
f
E
c
kT
= ln

n
N
c

+
n
2

2N
c
(16.22)
E
v
E
f
kT
= ln

p
N
v

+
p
2

2N
v
(16.23)
For the recombination terms you can use models tted to measured data [149], [150]. The
Schockley-Read-Hall is modeled as in equations 3.23, 3.24. The Auger recombination is modeled
as in equation 3.25.
Once the grid has been setup, each point is lled in with its material properties and the sim-
ulation can begin. Initially the holes and electrons are located where they are generated i.e. they
are located at the ionized dopants and all areas are charge neutral. In order to obtain the steady
state a pseudo time step is used to compute the carrier movement using the continuity equations.
179
It is a pseudo time step because you are using it to obtain the steady state rather than performing
a transient analysis.
There are two choices of iterations namely the Gummel [151] and the Newton. The Gummel
is easier to implement but you will need a large number of iterations to achieve convergence. The
Newton is more difcult to implement and each iteration requires more computational resources
but fewer iterations are required.
In the Gummel iteration you will rst solve the Poissons equation to obtain the potential for
the given values of n and p at the different nodes. Then you will solve the continuity equations to
get the new values of n and p at the different nodes. So in each iteration each equation is solved
separately and you iterate until you reach convergence.
The Newton iteration is the one that is normally used in device simulation. In each iteration
a system of equations is solved to obtain the new values of the potential, the n and the p. The
Newton-Raphson method [137] to iteratively solve for the root of f(x) = 0 is equation 16.24
where the derivative f

(z) at the guess value z is used with the value of f(z) in the k
th
iteration to
obtain the new value of z. So in order to create the system of equations you rst need to calculate
the partial derivative of each quantity with respect to the others as in the equation 16.25 where
the F
V
, F
n
and F
p
are the functions for V , n and p.
z
k+1
= z
k

f(z
k
)
f

(z
k
)
(16.24)

V
n
p

F
V
V
F
V
n
F
V
p
Fn
V
Fn
n
Fn
p
Fp
V
Fp
n
Fp
p

F
V
F
n
F
p

(16.25)
V
k+1
= V
k
+ V
n
k+1
= n
k
+ n
p
k+1
= p
k
+ p
(16.26)
16.4 Fabrication process simulation
Process simulation starts with dening the process. A typical process may contain about 140
steps or so. Although all of the wafer is affected by all of the steps, different regions of the wafer
will be masked during different steps. Although there may be 140 processing steps, there may be
only 10 different types of steps. For example the implantation step may be repeated more than
twenty times, similarly for the oxidation steps etc.
So the process denition starts by listing the steps in chronological order and assigning their
detailed parameters, for example an implantation step would require you to specify the dopant,
the energy, dose and angle. Similarly for all the other steps. Then for each device or structure that
you plan to simulate, you need to tell the simulator what the masking will look like as the device
proceeds through the fabrication simulation.
The equation used to model an implantation step is given by equation 16.27 where R
p
is the
projected range for that energy, is the dose and is due to the straggle.
N(z) =

exp

(z R
p
)
2
2
2

(16.27)
The equation used to model diffusion is given by equation 16.28 where D is the diffusion
coefcient. There are explicit diffusion steps where you apply a gel containing the impurity on an
180
exposed silicon surface and then heat it as a way to dope the silicon. But even dopants that were
implanted in a previous step will diffuse anytime heat is applied for example both oxidation and
annealing are high temperature steps. The effect is particularly noticeable for the nely tailored
drain engineering implants.
n
t
= D
2
n (16.28)
Apart from deciding what steps are performed in what sequence and what the masking is for
the different steps, the simulation of the process is no different from the device simulation. It is
probably easier because the equations are a lot less complex. The most difcult part of process
simulation is deciding what coefcients to use and how they change when the different steps are
mixed with each other.
For example if you implant into a clean silicon surface you may get a different prole than if
you implant into silicon that you just implanted into with a different dose and a different implant.
Similarly the diffusion coefcients of impurities when they are the only dopant in clean silicon
may be different than when other impurities are also present.
So just as in the case of the device simulator the purpose of process simulation is mainly to
understand the relative effect of a change in a process recipe rather than to determine the outcome
of a given recipe and so a mistake that happens all too often when the simulator is calibrated is to
treat the calibration parameters as fudge factors.
16.5 Monte Carlo analysis
This is the most effective method of analyzing the yield of a circuit. Monte Carlo analysis is
usually done primarily when the number of independent variables is large and their interdepen-
dence is characterized by intractable equations. Because equations used in semiconductor devices
often involve exponentials, substituting variables with distribution functions can lead to equations
which have nested exponentials which are not that easy to solve analytically.
Monte Carlo analysis has some advantages such as that it is easy to fully utilize the computing
capability of a massively parallel system. It has some disadvantages in that each subsequent sig-
nicant digit takes progressively longer to obtain so that after the initial convergence, any further
improvement is minimal and unreliable.
The way you would do MC analysis is to rst identify the portions of the circuit that cause a
loss in yield. Then you need to set up a simulation with a pass or fail condition. For example for
a digital circuit you may set up the test simulation so that the inputs arrive at time zero and the
output has to change within a specied time such as within a 100 ps.
n
1
x
Figure 16.9: Randomly selecting a value.
Now select the variables which you are going to vary. They have to be independent of each
other otherwise the analysis would be incorrect. For example you could vary T
ox
, V
th
, L
diff
and
contact resistance R
c
.
181
Now you select a set of (T
ox
, V
th
, L
diff
, R
c
) by the use of 4 random numbers n
1
, n
2
, n
3
, n
4
lying
between 0 and 1 obtained from 4 different pseudo-random number sequences using 4 different seed
values. The values x are obtained from the random numbers by satisfying the equations 16.29
and 16.30 as shown in the gure 16.9.
n
1
=
1

e
x
2
/2
dx (16.29)
V
th
= V
thmean
+ (x
V
th
) (16.30)
You now repeat the simulations many thousand times and count the number of successes and
failures. The yield is the fraction of successes out of the total number of tries. It is critical to use a
pseudo-random number sequence which has a "good spectral response" meaning that there is no
discernible pattern to the sequence.
182
Chapter 17
Power electronics
High power devices are primarily devices with a very large area. By distributing the current
over a large area the velocity of the carriers is kept to a reasonable number similar to that in a
device operating at lower currents so the current density is about the same as in any device. The
other issue is the heat generated due to the ow of current so the casing may be metal and the
chip is in good thermal contact with the casing.
High voltage is a different issue from high power and is usually achieved using longer space
charge regions. Here the goal is to keep the electric eld at the same level as it is in a normal device
for example a 0.13 FET operating at 1.8 V will have an electric eld in the channel of about 140
kV/cm so high voltage devices would also use similar eld strengths. The doping density will
also be lower so that at junctions the depletion width can be large, but even so the electric eld
will be the highest at the junction. The long space charge region has the effect of making the device
slower due to a longer transit time for the carriers.
17.1 Alternating current
Nikola Tesla invented alternating current and made it possible to transmit electricity over very
large distances over the power lines. The power lines that you see usually transmit electricity at
about 66,000 volts. The reason is shown in the gure 17.1.
RL
Rt
Vsup

+
Figure 17.1: Using direct current.
Let us suppose that the supply voltage in the gure 17.1 is 120 volts DC. If you have an
appliance that requires 1 A of current, its resistance is 120 . Let us now suppose that 1 mile of
power line (both ways) is 4. So if you are a 100 miles from the power station, R
t
= 400. So
if the voltage at the supply station is 120 V, what the load actually sees is 0.23 A. Even if the load
resistance is reduced almost to zero, the maximum current supplied by the station 0.3 A. Now
consider the situation for alternating current as shown in the gure 17.2.
183
RL
Rt
Vsup
Figure 17.2: Using alternating current.
In this gure the load is on the other side of a step down transformer from the power station.
On the primary side we use a supply voltage of 66,000 volts AC. The current in the primary de-
pends on the load on the secondary. So if you have the same requirement of 1 A on the secondary,
that works out to 1

2 = 1.414 A peak to peak, and that only requires 0.0026 A peak to peak at
66,000 V on the primary side, which means that the voltage dropped across the transmission line
is 0.0026 400 = 1.028 V which is a very very small part of 66,000 V indeed!
17.2 Transformers
When you pass current through a coil in any medium, the magnetic ux caused by elements
of that coil are seen by any conductor in the vicinity and is given by the Biot-Savart law. If there is
only one coil then that coil has a self-inductance. However if there are two coils then the magnetic
ux caused by one coil causes a current to ow in the other coil due to Amperes law. Such a
device is a transformer.
Figure 17.3: A transformer.
If the coil is wrapped around a closed loop of ferromagnetic material such as iron as shown in
the gure 17.3 then the magnetic ux is almost completely contained in the iron loop. So the coil
on one side is the input and causes the ux to ow and the coil on the other side is the output and
develops an electro motive force or emf which when connected to the output circuit will drive a
current.
If the number of complete turns on the input side is n
1
and the number of complete turns
on the output side is n
2
, then the ratio of the output voltage to the input voltage is given by
V
2
/V
1
= n
2
/n
1
.
17.3 Rectication
The biggest single usage of diodes is in rectication as shown in the gure 17.4. When the
upper pin on the left drives positive so too does the upper pin on the right. But when the upper
184
pin on the left drives negative, the upper pin on the right will drive positive. Since the upper pin
on the right is always positive rectication is said to have occurred.
Figure 17.4: Rectication.
The rectication is followed by a low pass lter to convert the rippled output into a constant
value. It does not have to be an RC section as shown but it can be an active regulation circuit that
uses timed switching to keep the output voltage and current constant.
17.4 DC to AC conversion
A fail safe power supply stores about a half hour of energy in a battery and if the power goes
off it outputs AC without a glitch. In order to do this it has to convert a DC voltage into 60 Hz AC.
The way this is done is by switching the DC on/off to produce a series of positive and negative
pulses as shown in the gure 17.5.
Figure 17.5: DC to AC conversion.
If this is followed by a low pass lter the pulses shown as solid lines will smear into the dotted
line which appears like a sine wave. Many reactive oscillatory circuits can then be used to further
convert the dotted curve into a real sine wave.
17.5 DC to DC conversion
Load

+
Figure 17.6: Increasing the voltage.
DC to DC converters are most often used in circuits where you wish to allow the customer to
use a single battery cell of 1.5 V but the circuit needs a 3 V supply to function. The circuit used is
185
of a type as shown in the gure 17.6.
The switch shown in the gure 17.6 is turned on and off at perhaps 50 kHz. When the switch
is turned on the inductor current rises. When the switch is opened this current is forced into
the capacitor until it starts to reverse direction at which point the diode stops the current ow.
If the LC oscillation frequency is 50 kHz as well, then the voltage across the capacitor will have
a peak voltage of twice the input voltage. By using a different LC combination or a different
switching frequency any voltage between the supply voltage and twice the supply voltage can be
generated. Similarly any voltage between 0 and the supply voltage can be generated by using the
conguration in the gure 17.7.
Load

+
Figure 17.7: Decreasing the voltage.
The oscillation frequency of the LC pair is given by 1/

LC, so by using a high frequency of 50


kHz or so, the inductance and capacitance used can be quite small. This type of circuit can only
be integrated onto a chip for small power supplies because otherwise you would require a large
capacitance. The inductor and the capacitor used may be off chip.
17.6 Silicon Controlled Rectier
n p n p
Figure 17.8: The structure of the scr.
SCRs are also called Thyristors because they are similar to transistors but behave like thyra-
trons. Thyratrons have a structure similar to a triode except that the tube is not evacuated but
lled with hydrogen at 1 mtorr. When the triode current ows the hydrogen is ionized and a high
current can ow. Thyratrons and thyristors are used in power circuits.
An scr has four contacts and it is formed by adding a fourth implant to the bjt as shown in the
gure 17.8. It is called a rectier because once it is turned on it it will conduct current in a single
direction much like a diode. Unlike a diode it needs a turn on signal.
The way an SCR is understood is usually by the use of the equivalent circuit shown in the
gure 17.9. You have a p-n-p transistor back to back with an n-p-n transistor. I
1
is the base current
of the n-p-n transistor causing a collector current I
2
to ow and this is the base current for the
upper p-n-p transistor.
186
n
p
p
n
I1
I2
Figure 17.9: An SCR is two back to back BJTs.
So once the current ow is started by turning on the n-p-n transistor, the gating signal is no
longer needed because the collector current of the one transistor forms the base current of the
other and the feedback loop keeps the current owing.
17.7 Power BJTs
Base
Emitter
ballast resistor base region
Collector region (back plane)
Figure 17.10: A power BJT.
Power BJTs are usually laid out in an interdigitated manner [152] as shown in the gures 17.10
and 17.11. The lowest layer is the collector which forms the back plane of the device. The dotted
layer is the base region which is contacted by contacts from the interdigitated base ngers on the
right.
The emitter ngers come in from the left. A few ngers are connected together and to the
emitter contact through a ballast resistor [153]. The reason that the resistor has to be integrated into
the transistor structure is that otherwise it could happen that most of the current ows through
187
base
emitter
collector
Figure 17.11: A power BJT.
a small portion of the transistor leading to a burn out. The current through a BJT increases with
temperature so if any region gets more current it will heat up and then the current increases again
and so forth. So the ballast resistors cause the current to ow evenly through the transistor and
avoid localized heating.
188
Bibliography
[1] R. C. Miller et al. IEEE Journal of Quantum Electronics, 1965.
[2] W. T. Read. A Proposed High-Frequency, Negative-Resistance Diode. Bell System Technical
Journal, 1958.
[3] B. C. Loach et al. Avalanche transit-time microwave oscillators and ampliers. IEEE Trans-
actions on Electron Devices, 1966.
[4] D. Scharfetter and H. K. Gummel. Large-signal analysis of a silicon Read diode oscillator.
IEEE Transactions on Electron Devices, 1969.
[5] J. B. Gunn. Microwave Oscillations of Current III-V Semiconductors. Solid State Communi-
cations, 1963.
[6] B. K. Ridley and T. B. Watkins. The Possibility of Negative Resistance Effects in Semicon-
ductors. Proceedings of the Physical Society, 1961.
[7] C. Hilsum. Transferred Electron Ampliers and Oscillators. Proceedings of the Institute of
Radio Engineers, 1962.
[8] J. S. Blakemore. Major properties of GaAs. Journal of Applied Physics, 1982.
[9] Shyh Wang. Fundamentals of semiconductor theory and device physics. Prentice Hall, 1989.
[10] Herbert Kroemer. Quantum Mechanics. Prentice Hall, 1994.
[11] Ramamurti Shankar. Principles of Quantum Mechanics. Plenum, 1994.
[12] Murray R. Spiegel. Mathematical handbook. McGraw-Hill, 1968.
[13] F. Bloch. Zeitschrift fur Physik, 1928.
[14] R. de L. Kronig and W. G. Penney. Proc. R. Soc. London, 1930.
[15] G. Dresselhaus et al. Physical Review, 1955.
[16] W. B. Joyce and R. W. Dixon. Analytic approximations for the Fermi energy of an ideal Fermi
gas. Applied Physics Letters, 1977.
[17] W. Shockley et al. Physical Review, 1950.
[18] J. R. Haynes and W. Shockley. Physical Review, 1951.
[19] W. Schockley and W. T. Read. Physical Review, 1952.
189
[20] R. N. Hall. Physical Review, 1952.
[21] H. C. Casey et al. Journal of Applied Physics, 1976.
[22] E. M. Conwell. Properties of silicon and germanium. Proceedings of the Institute of Radio
Engineers, 1958.
[23] S. M. Sze et al. Resistivity, Mobility, and impurity levels in GaAs at 300 K. Solid-State
Electronics, 1968.
[24] B. I. Halperin et al. Physical Review, 1966.
[25] J. R. Brews. A Charge-Sheet Model of the MOSFET. Solid-State Electronics, 1978.
[26] V. G. Reddi et al. Source to drain resistance beyond pinchoff in metal-oxide-semiconductor
transistor. IEEE Transactions on Electron Devices, 1969.
[27] D. M. Caughey et al. Carrier mobilities in silicon empirically related to doping and eld.
Proceedings of IEEE, 1967.
[28] J. A. Cooper et al. Measurement of high eld drift velocity of electrons in the inversion layer
is silicon. IEEE Electron Device Letters, 1983.
[29] L. D. Yau. A simple theory to predict the threshold voltage in short-channel IGFETs. Solid-
State Electronics, 1974.
[30] F. C. Hsu et al. IEEE Transactions on Electron Devices, 1983.
[31] H. C. Pao et al. Effects of Diffusion Current on Characteristics of Metal-Oxide Insulator-
Semiconductor Transistors. Solid-State Electronics, 1966.
[32] R. F. Pierret et al. Simplied long-channel MOSFET theory. Solid-State Electronics, 1983.
[33] W. M. Werner. The Work Function Difference of the MOS-System with Alluminum Field
Plates and Polysilicon Field Plates. Solid-State Electronics, 1974.
[34] L. A. Akers et al. A model of a narrow-width MOSFET including tapered oxide and doping
encroachment. IEEE Transactions on Electron Devices, 1981.
[35] L. A. Akers. The inverse narrow-width effect. IEEE Electron Device Letters, 1986.
[36] S. Adachi. GaAs, AlAs, and Al
x
Ga
1x
As: Material parameters for use in research and device
applications. Journal of Applied Physics, 1985.
[37] Sitaramarao S. Yechuri et al. Design of Flat-Band AlGaAs Heterojunction Bragg Reectors.
IEEE Transactions on Electron Devices, 1996.
[38] R. Bashir et al. Atomic force microscopy studies of self assembled si
1x
gex islands produced
by controlled relaxation of strained lms. Journal of Vacuum Science Technology, 2001.
[39] P. P. Debye et al. Physical Review, 1954.
[40] A. S. Grove. Redistribution of acceptor and donor impurities during thermal osidation of
Si. Journal of Applied Physics, 1964.
[41] P. Burggraaf. Wafer steppers and lens options. Semiconductor International, 1986.
190
[42] P. R. Gray and R. G. Meyer. Analysis and design of analog integrated circuits. John Wiley, 1993.
[43] Nobuhiko Mutoh et al. New empirical relation for MOSFET 1/f noise unied over linear
and saturation regions. Solid-State Electronics, 1988.
[44] Fernando Colombani et al. Extraction of microwave noise parameters of FET devices. IEEE
MTT-S Digest, 1990.
[45] Alfy Riddle. Extraction of FET model noise-parameters from measurement. IEEE MTT-S
Digest, 1991.
[46] Alain Cappy et al. High-frequency FET noise performance: A new approach. IEEE Transac-
tions on Electron Devices, 1989.
[47] Sam Pritchett et al. Improved FET noise model extraction method for statistical model de-
velopment. IEEE MTT-S Digest, 1995.
[48] G. A. Lang et al. Chemical polishing of silicon with anhydrous hydrogen chloride. RCA
Review, 1963.
[49] R. Nuttall. The dependence on deposition conditions of the dopant concentration of epitax-
ial layers. Journal of the Electrochemical Society, 1964.
[50] W. H. Shepherd. Doping of epitaxial silicon. Journal of the Electrochemical Society, 1968.
[51] D. J. Sykes. Recent advances in negative and positive photoresist technology. Solid State
Technology, 1973.
[52] L. N. Lie et al. High pressure oxidation of silicon in dry oxygen. Journal of the Electrochemical
Society, 1982.
[53] E. A. Irene et al. Silicon oxidation studies: The role of H
2
O. Journal of the Electrochemical
Society, 1977.
[54] R. R. Razouk et al. Kinetics of high pressure oxidation of silicon in pyrogenic steam. Journal
of the Electrochemical Society, 1981.
[55] J. Klerer. On the mechanismof the deposition of Silica by Pyrolytic decomposition of Silanes.
Journal of the Electrochemical Society, 1965.
[56] N. Goldsmith et al. The deposition of vitreous silicon dioxide lms fromsilane. RCA Review,
1967.
[57] M. Miyake et al. Incidence angle dependence of planar channeling in boron ion implantation
in silicon. Journal of the Electrochemical Society, 1983.
[58] J. Narayan et al. Characteristics of ion-implantation damage and annealing phenomena in
semiconductors. Journal of the Electrochemical Society, 1984.
[59] C. Murray. Wet etching update. Semiconductor International, 1986.
[60] D. R. Turner. On the mechanism of chemically etching germanium and silicon. Journal of the
Electrochemical Society, 1960.
[61] J. Kleinberg et al. Inorganic Chemistry. Heath and Co., 1960.
191
[62] W. R. Runyan and K. E. Bean. Semiconductor integrated circuit processing technology. Addison-
Wesley, 1990.
[63] D. L. Flamm et al. Basic chemistry and mechanisms of plasma etching. Journal of Vacuum
Science Technology, 1983.
[64] D. F. Downey et al. Introduction to reactive ion beam etching. Solid State Technology, 1981.
[65] V. Hoffman. High rate magnetron sputtering for metallizing semiconductor devices. Solid
State Technology, 1976.
[66] G. Harbeke et al. Growth and physical properties of LPCVD polycrystalline silicon lms.
Journal of the Electrochemical Society, 1984.
[67] T. Chung. Study of alluminum fusion into silicon. Journal of the Electrochemical Society, 1962.
[68] G. L. Schnable et al. Alluminum metallization - advantages and limitations for integrated
circuit applications. Proceedings of IEEE, 1969.
[69] A. J. Learn. Evolution and current status of alluminum metallization. Journal of the Electro-
chemical Society, 1976.
[70] P. Burggraaf. Silicide technology spotlight. Semiconductor International, 1985.
[71] A. E. Morgan et al. Characterization of a self-aligned cobalt silicide process. Journal of the
Electrochemical Society, 1987.
[72] K. Venkat et al. Timing verication of dynamic circuits. IEEE Journal of Solid-State Circuits,
1996.
[73] R. J. Widlar. Some circuit design techniques for linear integrated circuits. IEEE Transactions
on Circuit Theory, 1965.
[74] R. J. Widlar. NewDevelopments in ICVoltage Regulators. IEEE Journal of Solid-State Circuits,
1971.
[75] J. S. Brugler. Silicon transistor biasing for linear collector current temperature dependence.
IEEE Journal of Solid-State Circuits, 1967.
[76] Y. P. Tsividis et al. A CMOS voltage reference. IEEE Journal of Solid-State Circuits, 1978.
[77] K. R. Lakshmikumar et al. Characterization and modeling of mismatch in MOS transistors
for precision analog design. IEEE Journal of Solid-State Circuits, 1986.
[78] M. J. M. Pelgrom et al. Matching properties of MOS transistors for precision analog design.
IEEE Journal of Solid-State Circuits, 1989.
[79] S. J. Lovett et al. Optimizing MOS Transistor Mismatch. IEEE Journal of Solid-State Circuits,
1998.
[80] B.E. Boser. The design of sigma-delta modulation analog-to-digital converters. IEEE Journal
of Solid-State Circuits, 1988.
[81] G. F. Landsburg. A charge balancing monolithic a/d converter. IEEE Journal of Solid-State
Circuits, 1977.
192
[82] S. R. Norsworthy et al. Delta-Sigma Data Converters: Theory, Design and Simulation. John
Wiley, 1996.
[83] R. Schreier et al. Delta-sigma modulators employing continuous time circuitry. IEEE Trans-
actions on Circuits and Systems, 1996.
[84] J. A. Cherry et al. Continuous-time Delta-Sigma Modulators for High-speed A/D conversion: The-
ory, Practice and Fundamental Performance limits. Kluwer Academic, 1999.
[85] A. P. Chandrakasan et al. Low-power CMOS digital design. IEEE Journal of Solid-State
Circuits, 1992.
[86] K. Fukahori. A high precision micropower operational amplier. IEEE Journal of Solid-State
Circuits, 1979.
[87] G. W. Taylor. Subthreshold conduction in MOSFETs. IEEE Transactions on Electron Devices,
1978.
[88] J. Ramirez-Agulo et al. Characterization, evaluation and comparison of laser trimmed lm
resistors. IEEE Journal of Solid-State Circuits, 1987.
[89] J. A. Babcock et al. Precision electrical trimming of very low TCR Poly-SiGe resistors. IEEE
Electron Device Letters, 2000.
[90] J. Deverell. Pipeline iterative aritmetic array. IEEE Transactions, 1975.
[91] H. H. Guild. Fully iterative fast array for binary multiplication and addition. Electronics
Letters, 1969.
[92] J. V. McCanny et al. Completely iterative, pipelined multiplier array suitable for VLSI. Pro-
ceedings of IEE, 1982.
[93] R. F. Lyon. Twos complement pipeline multipliers. IEEE Transactions, 1976.
[94] H. T. Kung. Why systolic architechtures ? Computer Magazine, 1982.
[95] A. Robertson. Anewclass of digital division methods. IRE Transactions Electronic Computers,
1958.
[96] Tocher. Techniques of multiplication and division for automatic binary computers. Quarterly
J. Mech. and Applied Math, 1958.
[97] N. Kurd et al. Multi-GHz Clocking Scheme for Intel Pentium 4 Microprocessor. IEEE Inter-
national Solid-State Circuits Conference, 2001.
[98] T. Xanthopoulos et al. The Design and Analysis of the Clock Distribution Network for a 1.2
GHz Alpha Microprocessor. IEEE International Solid-State Circuits Conference, 2001.
[99] Floyd M. Gardner. Phaselock techniques. John Wiley & Sons, 1979.
[100] Katsuhiko Ogata. Modern control engineering. Prentice Hall, 1997.
[101] Ronald N. Bracewell. The Fourier transform and its applications. McGraw-Hill, 1986.
[102] James W. Cooley and John W. Tukey. An algorithm for the machine calculation of complex
fourier series. Mathematics of Computation, 1965.
193
[103] G. C. Danielson and C. Lanczos. Some improvements in practical fourier analysis and their
application to x-ray scattering from liquids. J. Franklin Inst., 1942.
[104] D. A. Huffman. A method for construction of minimum redundancy codes. Proceedings of
the Institute of Radio Engineers, 1952.
[105] J. A. Miller. Maximally at nonrecursive lters. Electronics Letters, 1972.
[106] B. C. Jinaga et al. Coefcients of maximally at nonrecursive digital lters. Signal Processing,
1984.
[107] C. E. Willert and M. Gharib. Digital particle image velocimetry. Experiments in Fluids, 1991.
[108] I. S. Reed and G. Solomon. Polynomial codes over certain nite elds. Journal of the society
for industrial and applied mathematics, 1960.
[109] N. Zierler. Linear recurring sequences. Journal of the society for industrial and applied mathe-
matics, 1959.
[110] P. Elias. Coding for noisy channels. IRE Convention Record, 1955.
[111] A. J. Viterbi. Error bounds for convolutional codes and an asymptotically optimum decod-
ing algorithm. IEEE Transactions on Information Theory, 1967.
[112] T. Jung et al. A500 mhz ate pin driver. Bipolar/BiCMOS Circuits and Technology Meeting, 1992.
[113] S. B. Cohn. Characteristic impedance of the shielded-strip transmission line. IRE Transactions
on MTT, 1954.
[114] H. A. Wheeler. Transmission-line properties of a strip on a dielectric sheet on a plane. IEEE
Transactions on Microwave Theory and Techniques, 1977.
[115] H. A. Wheeler. Transmission line properties of a stripline between parallel planes. IEEE
Transactions on Microwave Theory and Techniques, 1978.
[116] S. B. Cohn. Slotline on a Dielectric Substrate. IEEE Transactions, 1969.
[117] K. C. Gupta et al. Microstrip lines and slotlines. Artech House, 1996.
[118] H. Bex. New broadband balun. Electronics Letters, 1975.
[119] C. E. Fay et al. Operation of the Ferrite Junction Circulator. IEEE Transactions on Microwave
Theory and Techniques, 1965.
[120] R. W. Klopfenstein. A transmission line taper of improved design. Proceedings of the Institute
of Radio Engineers, 1965.
[121] R. E. Collin. The optimum tapered transmission line matching section. Proceedings of the
Institute of Radio Engineers, 1965.
[122] S. B. Cohn. Optimum design of stepped transmission line transformers. IEEE Transactions
on Microwave Theory and Techniques, 1955.
[123] P. I. Richard. Resistor-transmission-line circuits. Proceedings of the Institute of Radio Engineers,
1948.
194
[124] M. C. Horton et al. General Theory and Design of Optimum Quarter-Wave TEM Filters.
IEEE Transactions on Microwave Theory and Techniques, 1965.
[125] L. Young. Microwave Filters 1965. IEEE Transactions on Microwave Theory and Techniques,
1965.
[126] B. J. Minnis. Printed circuit coupled-line lters for bandwidths up to and greater than an
octave. IEEE Transactions on Microwave Theory and Techniques, 1981.
[127] R. F. Kopf et al. N- and p-type dopant proles in distributed Bragg reector structures and
their effect on resistance. Applied Physics Letters, 1992.
[128] E. F. Schubert et al. Elimination of heterojunction band discontinuities by modulation dop-
ing. Applied Physics Letters, 1992.
[129] F. Capasso et al. AlGaAs/GaAs staircase avalanche photodiodes with high and extremely
uniform avalanche gain. IEEE IEDM, 1988.
[130] H. C. Casey et al. Heterostructure Lasers. Academic, 1978.
[131] O. K. Kim et al. A low dark-current, planar InGaAs p-i-n photodiode. IEEE Journal of
Quantum Electronics, 1985.
[132] W. Franz. Z. Naturforsch., 1958.
[133] L. V. Keldysh. Sov. Phys. JETP, 1958.
[134] N. Susa et al. IEEE Journal of Quantum Electronics, 1981.
[135] J. C. Campbell et al. Journal of Applied Physics, 1982.
[136] R. W. Keyes. The effects of elastic deformation on the electrical conductivity of semiconduc-
tors. Solid State Physics, 1960.
[137] F. B. Hildebrand. Introduction to numerical analysis. Dover, 1974.
[138] Athanasios Papoulis. Probability, random variables, and stochastic processes. McGraw-Hill,
1991.
[139] R. K. Livesley. Finite elements: an introduction for engineers. Cambridge University Press,
1983.
[140] W. H. Press et al. Numerical recipes in C:The art of scientic computing. Cambridge University
Press, 1992.
[141] I. A. Duff. A survey of sparse matrix research. Proceedings of IEEE, 1977.
[142] K. Levenberg. Quarterly Applied Math, 1944.
[143] D. W. Marquardt. Journal of the society for industrial and applied mathematics, 1963.
[144] C. S. Rafferty et al. Iterative methods in semiconductor device simulation. IEEE Transactions
on Electron Devices, 1985.
[145] R. E. Bank et al. An adaptive, multi-level method for elliptic boundary value problems.
Computing, 1981.
195
[146] S. Selberherr et al. MINIMOS - A Two-Dimensional MOS analyzer. IEEE Transactions on
Electron Devices, 1980.
[147] A. Yoshii et al. A three-dimensional analysis of semiconductor devices. IEEE Transactions on
Electron Devices, 1982.
[148] M. K. Lundstrom et al. Numerical analysis of heterostructure semiconductor devices. IEEE
Transactions on Electron Devices, 1983.
[149] G. B. Lush. A study of minority carrier lifetime versus doping concentration in n-type gaas
grown by metalorganic chemical vapor deposition. Journal of Applied Physics, 1992.
[150] G. Bemski. Recombination in semiconductors. Proceedings of IEEE, 1958.
[151] H. K. Gummel. Aself-consistent iterative scheme for one-dimensional steady state transistor
calculations. IEEE Transactions on Electron Devices, 1964.
[152] R. Allison. Silicon bipolar microwave power transistor. IEEE Transactions on Microwave
Theory and Techniques, 1979.
[153] R. P. Arnold et al. A quantitative study of emitter ballasting. IEEE Transactions on Electron
Devices, 1974.
196
Index
1-bit DAC, 96
Accelerometers, 173
Aliasing, 125
Amperes law, 158
Balun, 162
Band diagrams, 20
band-gap reference, 85
binary DAC, 94
Bloch theorem, 16
Bootstrapping, 84
Bragg reectors, 170
Burn in, 54
cascoded current source, 80
Circulators, 164
clock skewing, 108
Clock trees, 109
common centroid, 87
convolutional encoding, 132
coplanar
strip, 161
waveguide, 161
Coupled line lters, 167
critical path, 106
cross-correlation, 130
cyclotron resonance, 17
Davisson-Germer experiment, 14
de Broglie wavelength, 14
De Morgans theorem, 69
depletion approximation, 23
DFT, 124
DLL, 120
Domino logic, 74
driving point impedance, 77
DUT, 151
Early voltage, 27
Edge emitting laser, 169
ESD, 141
Etching, 61
Faradays law, 158
Fermi-Dirac distribution, 18
FFT, 124
Finite state machines, 72
FIR lter, 129
Flip-ops, 70
Free electron theory, 15
full-adder, 101
Gain bandwidth product, 85
Gausss law, 158
grid
uniform, 177
adaptive, 178
Ground bounce, 145
Guild multiplication array, 102
Gunn diode, 11
hang states, 74
Haynes-Schockley experiment, 19
Huffman coding, 127
Implantation, 60
Junction detectors, 172
Karnaugh maps, 71
Kirchoffs laws, 3
Klopfenstein taper, 165
Klystron tube, 8
Kronig Penney model, 16
Level shifting, 82
Line and stub lters, 166
Line terminations, 142
Liquid Czochralski pull, 56
Maximum power transfer theorem, 5
Mesh equations, 4
microstrip, 159
197
Millers theorem, 84
Node equations, 4
Noise, 51
normal distribution, 182
Nortons theorem, 5
Nyquist theorem, 125
Ohms law, 2
Over-sampling, 126
Parallel A/D, 92
pass gate, 70
phase-frequency detector, 116
Pin driver, 155
Pipelining, 107
Power BJTs, 187
probe card, 42
probe station, 42
process independent resistance, 79
Production monitors, 50
quarter wavelength transformer, 165
R-2R ladder, 94
Ramp circuit, 154
random numbers, 182
Read diode, 10
Reed-Solomon codes, 131
reection coefcient, 158
Ring oscillator, 112
Ringing, 146
Schroedingers wave equation, 14
Serial A/D, 92
Shift register, 103
shift register, 103
Sigma-Delta ADC, 94
siliciding, 63
Silicon Controlled Rectier, 186
slotline, 159
Small signal equivalent, 29
Sputtering, 63
Striping, 50
stripline, 159
Sub-diffusion, 35
Sub-threshold swing, 33
Successive approximation, 92
Surface emitting laser, 170
Thevenins theorem, 5
Timing chip, 153
transmission coefcient, 158
truth table, 69
Vacuum
diode, 7
triode, 8
VCO, 114
Viterbi decoding, 133
Widlar current source, 79
Window function, 126
198

You might also like