You are on page 1of 48

SMT ASSEMbly for EnginEErS SM

R&D Prototype
pcb assembly
$50 in 3-Days
all smt parts machine-placed!
R&D assembly pRicing matRix
3-Day turn
Up to # smt parts 25 50 100 150 200 250 300 Over 300
1st board $50 $80 $100 $150 $200 $250 $300 Call
2nd board $25 $50 $60 $90 $120 $160 $180 for
Each additional board $20 $30 $40 $60 $90 $120 $150 Pricing
Stencil $50 $50 $50 $50 $50 $50 $50
FREE Tooling anD PRogRamming
How to Order: get the advanced advantage:
• Email boM file, XyrS file and gerber files • loose parts, cut tape or reels – oK
to order@aapcb.com • Machine-placed SMT parts
• Send parts and bare pcbs or we will order them • Assemblies shipped in days
• full turn-key available
ISO 9001:2000 Certified

www.aapcb.com/system • 1-800-838-5650
SmT Component gauge and Sizing Resource
Use the guidelines below to verify you have the correct parts before the assembly
process begins.

paRt size inDicatOR


below are footprints of the 50 most common passive component packages. To find out the industry
identification for your part size, place it on the footprint that best fits the component.

cOmpOnent pitcH gaUge


To use this gauge, align the first lead with the initial size indicator space. This verification tool
will help ensure the part you have is the part you want.

www.aapcb.com/system • 1-800-838-5650
SMT ASSEMbly for EnginEErS SM
VOLUME 22,
NUMBER 8
SEPTEMBER 2009

Embedded Systems Design


The Official Publication of The Embedded Systems Conferences and Embedded.com

SENSORS RAISE
your system’s IQ
14
Cell processor revisited
22
Is virtualization
right for your project?
28
Ganssle on memory
37
THREE AIRCRAFT, A SINGLE MODEL,
AND 80% COMMON CODE.

THAT’S MODEL-BASED DESIGN.


To develop the unprecedented
three-version F-35, engineers
at Lockheed Martin created a
common system model to
simulate the avionics, propulsion,
and other systems and
to automatically generate
final flight code.
The result: reusable designs,
rapid implementation, and
global teamwork. To learn more,
visit mathworks.com/mbd

TM

Accelerating the pace of engineering and science ©2008 The MathWorks, Inc.
The only operating system
certified to EAL6+ High Robustness.

www.ghs.com

Copyright © 2008 Green Hills Software, Inc. Green Hills, the Green Hills logo, and INTEGRITY are trademarks of Green Hills Software, Inc. in the
U.S.and/or internationally. All other trademarks are the property of their respective owners.
One Debugger/ Programmer

Microcontrollers
The Microchip name and logo, the Microchip logo, HI-TECH C, MPLAB, PIC and dsPIC are registered trademarks of Microchip Technology Incorporated in the USA and in other countries. PICkit is a registered trademark of Microchip Technology Incorporated in the USA and in other

A CD that includes:
– PICkit™ 3 User’s Guide

Digital Signal
Controllers
– A series of 12 lessons on PIC® MCUs
with C source code
– A debugging tutorial PICkit 3
– Microchip’s MPLAB® IDE software debugger/
– Free MPLAB C Compiler for all programmer
PIC MCUs and dsPIC DSCs

Analog
– CCS compiler for the PIC18F45K20
– HI-TECH C® Compilers PRO for
PIC10/12/16/18/32 running in
Lite Mode

Serial
EEPROMs
A 44-pin demo
board with a
PIC18F45K20
microcontroller
countries. All other trademarks mentioned herein are property of their respective companies. ©2009, Microchip Technology Inc.

One Low Price ... $69.99


Microchip’s PICkit 3 Debug Express (part number DV164131) incorporates in-circuit debugging technology to provide an extremely
affordable hardware debugger and programmer for the entire range of Microchip’s Flash Microcontrollers (MCUs) – from the smallest
8-bit PIC10 MCU, through all 16-bit dsPIC® DSCs to the largest 32-bit PIC32 MCU.
PICkit 3 runs under the popular MPLAB IDE, the free graphical integrated development environment software, complete with
programmer’s editor, software simulator and project manager. MPLAB IDE’s GUI promotes advanced debugging with a host of auxiliary
features, such as a segmented LCD Designer, a graphics display designer and a data monitor and control interface. Easily connected to the
PC with a USB interface, PICkit 3 is bundled with a demo board for fast learning and initial design prototyping. The two-wire interface
easily connects to the final designs for application tuning and quick in-circuit programming.
With rich features, free compilers fully integrated with MPLAB IDE and a suite of tutorials, Microchip PICkit 3 Debug Express delivers
substantial value at a remarkably low price.

For All PIC® Microcontrollers


www.microchip.com/PICkit3
www.microchip.com/usb
T H E O F F I C I A L P U B L I C AT I O N O F T H E E M B E D D E D S Y S T E M S C O N F E R E N C E S A N D E M B E D D E D. C O M

COLUMNS
programming
pointers 11
Allocating and deallocating
arrays, in detail
BY DAN SAKS
Deallocating array objects, not
just array storage, requires a little
advance planning.
EMBEDDED SYSTEMS DESIGN
VOLUME 22, NUMBER 8 break points 37
SEPTEMBER 2009 Thanks for the memories
BY JACK G. GANSSLE
From oral tradition and clay tablets all
the way to modern memory devices,

14 Cover Feature:
Jack Ganssle traces the history of
memory.

Getting in touch with DEPARTMENTS


#include 5
capacitance sensor algorithms 3G is passé; 4G is the real deal
BY JOHN CAREY BY RICHARD NASS
Learn about the role of capacitance The handset vendors claim that
measurement algorithms in multi-touch 4G is a real panacea for end users.
sensing user interfaces. But do end users even know what
4G is?

parity bit 6

22Gaming the system—


28Is virtualization right
marketplace
advertising index 38
38

high-end networking for your application?


on the Cell processor BY CASEY WELTZIN
Here’s a brief “no-bull” tutorial IN PERSON
BY JIM TURLEY
on how virtualization actually ESC Boston
Complex multicore processors are
works and where it is most September 21–24, 2009
upon us. With the right open-source
useful. www.embedded.com/esc/boston
tools and a commercial RTOS, they
don’t have to be monsters to program. ESC UK
October 6–8, 2009
www.embedded.co.uk

33 ESC paper: Seamless integration of multicore


embedded systems
BY G. DE SIMONE, P. PIERANI, AND M.QUAGLIANI
ESC Grenoble
December 1–3, 2009
www.design-reuse.com/ipesc09/

Gradually introduce performance improvements while preserving ESC Silicon Valley


an established functional baseline in an embedded system with April 26–29, 2010
demanding characteristics requirements. www.embedded.com/esc/sv

EMBEDDED SYSTEMS DESIGN (ISSN 1558-2493 print; ISSN 1558-2507 PDF-electronic) is published monthly with the exception of a combined July/August issue by
TechInsights, 600 Harrison Street, 5th floor, San Francisco, CA 94107, (415) 947-6000. Please direct advertising and editorial inquiries to this address. SUBSCRIPTION
RATE for the United States is $55 for 12 issues. Canadian/Mexican orders must be accompanied by payment in U.S. funds with additional postage of $6 per year. All other
foreign subscriptions must be prepaid in U.S. funds with additional postage of $15 per year for surface mail and $40 per year for airmail. POSTMASTER: Send all changes
ONLINE
to EMBEDDED SYSTEMS DESIGN, P.O. Box 3404, Northbrook, IL 60065-9468. For customer service, telephone toll-free (877) 676-9745. Please allow four to six weeks
for change of address to take effect. Periodicals postage paid at San Francisco, CA and additional mailing offices. EMBEDDED SYSTEMS DESIGN is a registered trade-
www.embedded.com
mark owned by the parent company, TechInsights. All material published in EMBEDDED SYSTEMS DESIGN is copyright © 2009 by TechInsights. All rights reserved.
Reproduction of material appearing in EMBEDDED SYSTEMS DESIGN is forbidden without permission.
B
E
N
C
H
BUILD it [Reliably]
With Express Logic’s award-winning BenchX® IDE or use tools from
over 20 commercial offerings including those from ARM, Freescale,
Green Hills, IAR, Microchip, MIPS, Renesas, and Wind River.

RUN it [Fast]
With Express Logic’s small, fast, royalty-free and industry leading
ThreadX® RTOS, NetX™ TCP/IP stack, FileX® FAT file system, and USBX™
T H R E A D

USB stack.

ANALYZE it [Easily]
T
With Express Logic’s graphical TraceX® event analysis tool, and new
R

StackX™ stack usage analysis tool. See exactly what is happening in your
A
C
E

system, which is essential for both debugging and optimization.

SHIP it [Confidently]
No matter what “it” is you’re developing, Express Logic’s solutions will
help you build it, analyze it, run it, and ship it better and in less time. Join
the success of over 600,000,000 deployed products using
Express Logic’s ThreadX!

B E N C H T H R E A D T R A C E S T A C K

Newnes

ion
Second Edit
E
REAL-TIM
ED
EMBEDD ADING
RE
MULTITH dX
With Threa
re,
ARM, Coldfi

For a free evaluation copy, visit www.rtos.com • 1-888-THREADX


ndices for
with appe res
Now architectu
PowerPC
MIPS and

CD-R OM
INCL UDED
ThreadX
Containing

ThreadX, BenchX, TraceX and FileX are a registered trademarks of Express Logic, Inc. All other trade-
n system
demonstratio
examples
and C code

marks are the property of their respective owners.


AD
THRE

L. Lamie
Edward
EMBEDDED SYSTEMS DESIGN

Director of Content/Media,
BY Richard Nass#include
TechInsights
Richard Nass
(201) 288-1904
rnass@techinsights.com
Managing Editor
3G is passé; 4G is the
Susan Rambo
srambo@techinsights.com
Contributing Editors
Michael Barr, John Canosa,
real deal
Jack W. Crenshaw, Jack G. Ganssle,

N
Dan Saks, Larry Mittag
ow that we’re all pretty a nominal 100-Mbit/s data rate while
Art Director
Debee Rommel sure we understand what the client (handset) physically moves
drommel@techinsights.com 3G stands for (or repre- at high speeds relative to the base sta-
European Correspondent sents), it’s time to move on to 4G. To tion, and 1 Gbit/s while the client and
Colin Holland
colin.holland@techinsights.com review, the “real” definition for 3G is station are in relatively fixed posi-
Embedded.com Site Editor
384 kbits/s. That’s the speed for a tions; a smooth handoff across het-
Bernard Cole mobile device. That high rate results erogeneous networks (wouldn’t that
bccole@acm.org
in a host of applications/features that be nice); seamless connectivity and
Production Director are more like what a consumer global roaming across multiple net-
Donna Ambrosino
dambrosino@ubm-us.com would equate with 3G. The more works (again, sounds good, but is
Subscription Customer Service widely recognized definition by con- quite difficult to implement); and
P.O. Box 2165, Skokie, IL 60076 sumers is that the bandwidth is high support for next generation multi-
(800) 577-5356 (toll free)
Fax: (847) 763-9606 enough to permit voice calls, video media applications, such as HDTV
embeddedsystemsdesign@halldata.com calls, and wireless data, all in a mo- video content, mobile TV.
www.customerserviceesp.com
bile environment. What got me to thinking about
Article Reprints, E-prints, and
Permissions For a long time, I would tell peo- this 3G-4G stuff is a couple of visits
Mike O’Brien ple who may not have understood I’ve made recently with two of the
Wright’s Reprints
(877) 652-5295 (toll free) the technical definition for 3G that it cellular power-amplifier (PA) ven-
(281) 419-5725 ext.117 was really the melding of cellular and dors. One, Anadigics is recognized by
Fax: (281) 419-5712
www.wrightsreprints.com/reprints/index.cfm WiFi. That’s obviously not technically some to be the leader in PA technolo-
?magid=2210 correct, but it lets people understand gy. They offer a really low-power de-
Publisher that a 3G handset is one that allows vice in a small package. The second is
David Blaza
(415) 947-6929
you to access the Internet as well as a vendor with a soon to be an-
dblaza@techinsights.com make phone calls. nounced part that—if it meets the
Editorial Review Board It’s somewhat ironic that 4G can claims the company is making—is
Michael Barr, Jack W. Crenshaw, be described by a similar definition, further ahead than anything I’ve
Jack G. Ganssle, Bill Gatliff,
Nigel Jones, Niall Murphy, Dan Saks, with respect to features and applica- come across. If it’s true, you’ll see
Miro Samek tions. In addition to those previous- higher integration within your hand-
ly mentioned, it could include the set pretty soon.
addition of a WiFi/WiMax interface
TM

for voice, video, or data. It also in-


Corporate—TechInsights
Paul Miller Chief Executive Officer cludes a more comprehensive securi-
Aharon Shamash Chief Financial Officer
Felicia Hamerman Group Marketing Director ty solution.
Randall Freeborn Chief Human Resources Officer
Harry Page Senior VP Professional Services, Some of the objectives that the
Semiconductor Insights, Portelligent
4G Working Group has defined for Richard Nass
its communication standard include rnass@techinsights.com

Corporate—UBM LLC
Marie Myers Senior Vice President,
Manufacturing
Pat Nohilly Senior Vice President, Strategic Richard Nass is the
Development and Business
Administration
director of content/
media at TechInsights.
You may reach him at
rnass@techinsights.com.

www.embedded.com | embedded systems design | SEPTEMBER 2009 5


parity bit
Real programmers code in (fill in blank)
M ichael Barr raises a good point
(“Real men program in C”
July/August 2009, p.9, www.em-
bedded.com/218600142). Teaching
Java/Python/Ruby/TCL, etc., does not
coding? Remember how they talked
about the “old” economy and how they
were part of the “new wave of computer
scientists”? Remember how they got
canned in 2001 as soon as decent web-
the microcontroller. When hiring a
software guy, I score about 30% on
their C skills, 40% on being able to pick
and use the best language to solve to-
day’s crisis, and the rest divided be-
teach computer engineering. They may design software came out and the econ- tween their bit-flipping and communi-
be useful for many tasks, but they let omy took a dip? Get ready to update cation skills.
the student avoid really understanding your resume. —Dave Hamara
what the computer is doing. They teach I phone screened 110 “Senior Em- Posted on Embedded.com’s forum
how to get something working quickly bedded Software Engineers” to get five
by trading off speed and size. to six who could competently answer “A From a somewhat specialized perspec-
In this day and age when nearly ‘C’ Test: The 0x10 Best Questions for tive (EDA), it is difficult to see C/C++
every technical professional can pro- replaced soon for heavy-lifting applica-

!
gram (astronomers/astrophysicists, tions. There is unquestionably high
sstructural engineers, civil engineers,
People use C because it is popularity for languages a level above C
mathematicians, chemists, biologists, the least evil option. For (I myself am a Perl fan and would hate
cryptographers, and so on), computer to be forced into C unless absolutely
professionals need to do better than
hack out code in “some high-level lan-
guage” to earn their daily bread. As ! embedded realtime sys-
tems, you really only have
necessary). But I observe there is con-
siderable fragmentation above the
C/C++ level—Perl, Ruby, Python,

! the choises listed: C, C++,


practitioners of a profession that’s all Java—Java is perhaps the most estab-
about the tool (computer), computer lished of these meta-languages and al-
professionals need to be experts in the or assembler. ready making inroads in handset appli-
tool and able to write the foundation cations, and I could forsee a day when
that makes efficient use possible for Would-be Embedded Programmers.” JREs are successfully implemented di-
others. They also need to be able to go Note that it shouldn’t be possible to rectly in hardware (attempts have al-
as deep as necessary when things go graduate from a reputable CS curricu- ready been made, but I am unaware of
badly with all those nice abstractions. lum without being able to answer these any being wild successes).
So, if you aren’t a deep expert in questions. While each of these meta-languages
computers, how are you in any way —Luke Teyssier has value, I think a dominant standard
more than just an assistant to those Posted on Embedded.com’s forum has to emerge, with equivalent per-
professionals who have a deep knowl- formance across a broad range of appli-
edge of their own expert field, plus can Don’t be too quick to dismiss very cations for C/C++ to be superceded.
do your job too? IMHO, if you can’t ex- high-level languages for embedded —Bernard Murphy
plain in detail what happens from work. I design instruments for NASA, Posted on Embedded.com’s forum
printf(“Hello, world\n”), all the way and our group codes the firmware in a
down to “fetch, decode, execute,” you mixture of C and assembly. But when I think part of the reason a younger
either need to find it out, or accept the I’m designing a board, I have to write generation of software engineers is not
fact that your job may vaporize as soon code in visual basic to glue my various attracted to embedded software (and
as someone figures out how to encapsu- EDA tools together, I build very com- hence C) is the lack of money/
late your limited knowledge into a GUI plex models in several different envi- prestige. I have been stunned to see
builder or wizard. ronments, we distribute the test data some job postings lately that require
Remember all those folks with de- through Java applets on the team web years of embedded software expert-
grees in English and art getting page, and we analyze the data in either ise/medical device control expertise/etc.
$150,000 a year in 1998 doing html IDL or Igor. Embedded isn’t just about and the pay rate is $30/hr! If a software

6 SEPTEMBER 2009 | embedded systems design | www.embedded.com


Maximize Multicore
Increase Your Performance with NI LabVIEW

Single-Core Sequential C Program LabVIEW


Application on Multicore on Multicore

Using LabVIEW — an inherently parallel graphical programming language —


you can take advantage of the latest advances in multicore systems, including
real-time symmetric multiprocessing and FPGA-based embedded systems.
LabVIEW simplifies visualizing and programming parallel applications for test,
control, and embedded design. Using LabVIEW, customers are realizing a
near-linear performance increase on their multicore systems.

>> Learn to maximize your multicore processor at ni.com/multicore 866 337 5041

©2009 National Instruments. All rights reserved. LabVIEW, National Instruments, NI, and ni.com are trademarks of National Instruments.
Other product and company names listed are trademarks or trade names of their respective companies. 2008-10525-821-101-D
The Newest
Embedded Technologies

New Products from:

Tower System
www.mouser.com/
freescale_tower_system

TM

Joule-Thief™ Module
www.mouser.com/
adaptivenergy_joule-thief

ZICM2410P0 MeshConnect™
ZI
Module
M
www.mouser.com/cel/a
w

The ONLY New Catalog Every 90 Days

Experience Mouser’s time-to-market


advantage with no minimums and same-day
shipping of the newest products from more
than 390 leading suppliers.

MiniCore™ RCM5600W Wi-Fi Module


www.mouser.com/rabbit_rcm5600/ The Newest Products
For Your Newest Designs

www.mouser.com
Over A Million Products Online
(800) 346-6873
engineer coming out of school can go programs in it is not an option if you OS for 8-bit MCU
to Google et. al. and get a hefty salary, a want portable code. At the escalating The article (Dave Armour, “FLIRTing
very cool reputation and stock options, pace in which the silicon industry plops with 8-bit MCU OSes,” July/August
why are they going to go into a more out new processors, one can almost be 2009, p14, www.embedded.com/
meticulously demanding job that re- certain that the program will live longer 218600135) is very insightful and intu-
quires embedded design skills? Compa- than the processor. Portable code is a itive. Speaking of small footprint
nies that need embedded software engi- must in the embedded world, therefore RTOSes with round robin scheduling,
neers need to value them appropriately we need high-level languages. Private have you checked out µC/OS-III. Its
and make these jobs more attractive. encapsulation is another good reason press release (www.micrium.com/
—Susan McCord why. news/2009-03-24_Micrium-Expands-
Posted on Embedded.com’s forum I can come up with a few advan- RTOS-Family.html) in which they claim
tages of C++ over C in embedded sys- that the new kernel would be support-
We might want to consider adding a tems, namely stronger type enforce- ing tasks of equal priority and more-
new term to the embedded universe, ment, exception handling, standardized over the system promises near zero in-
something like, “Complex Embedded” inline asm, standardized inlining (C terrupt disable time—surprising!!
to describe systems or applications that does not have this because nobody but Apart from the article what also
require (or use) processors or other suf- the ISO C committee programs in caught my eye is the short biography at
ficiently complex logic that require or C99). the end of it saying “recently laid off
are best programmed by higher-level, But the disadvantages weight heav- firmware engineer with 15+ years of
perhaps OO languages. For example, ier in my opinion. experience with lots of time in hand.”
my Atmel 8-bit designs are all done in —Daniel Lundin Just goes to show the kind of talent that
C, not because it’s the only language Posted on Embedded.com’s forum is sitting idle...scary!!!
available for the Atmel AVR’s (assem- (Daniel Lundin’s comments continue —Saurabh Gandhi
bler and Forth are others of which I’m online in forum at bottom of the article Embedded Software Engineer
aware) but because it’s readily available, at www.embedded.com/218600142.) Posted on Embedded.com’s forum
has a high-enough level of abstraction
to make many functions easy to write Real men only program in two lan- I think that the brief critique of the ar-
but enough hardware manipulation to guages and they both begin with A, As- ticle “Build the Super Simple Tasker”
handle the basic peripherals of the de- sembler, and Ada. Why don’t you see (www.embedded.com/190302110) is not
vice. When I wrote a Windows GUI- more non-military systems developed fair. All preemptive, priority-based ker-
based configuration application for one in those languages? Cause there aren’t nels work the way described in the SST
of our electronic advance ignition sys- enough real men out there to go article, that is, the higher-priority tasks
tems, I used C#, because it was a good around. have always precedence in accessing the
fit for that kind of system. I think we’ve —William Thomas CPU before the lower-priority tasks.
reached the point where we need to Posted on Embedded.com’s forum Starvation is only a problem when
start differentiating the complexity/ high-priority tasks take up too much of
memory/cpu aspects of embedded sys- Maybe that should be “Real old men the CPU, but this means that the sys-
tems before asking the question about program in C.” Anybody suffering from tem is overloaded. Also, it is not true
what language we use in writing the the delusion that C is the only viable that the SST kernel requires state ma-
software. New terms, anyone? “high-level” language for embedded de- chines. In fact, the example code pro-
—David Telling velopment should have a good hard vided with the SST article (ftp://ftp.em-
Posted on Embedded.com’s forum look at Oberon-07. (www.inf.ethz.ch/ bedded.com/pub/2006/07samek/sst.zip)
personal/wirth/Articles/Oberon.html) did not use state machines at all. The
People don’t use C because it’s a good It is as powerful, if not more so, article merely mentions state machines
language, people use C because it is the than C but is much less prone to hu- because they are a natural fit for the
least evil option. For embedded real- man error. kernel of this type.
time systems, you really only have the —Chris Burrows —Miro Samek
choises listed: C, C++, or assembler. Development Manager (Read the rest of Miro Samek’s comments
Assembler will of course be used at CFB Software and the author’s response on the Embed-
some extent “inline”, but to write whole Posted on Embedded.com’s forum ded.com at www.embedded.com/forums.)

www.embedded.com | embedded systems design | SEPTEMBER 2009 9


Development
Solutions

for ARM, 8051 & XE166 Microcontrollers


C and C++ Development Tools
• Best-in-class Keil and ARM compilers for small, fast code
• Genuine Keil µVision® IDE/Debugger/Simulator
• Fast development and verification using Device Simulation
• Easy device configuration with Device Database support
for more than 1700 devices

Microcontroller
Development Kits
C and C++ Compilers A/D I/O Parts Debug
Converter Run-Control
Examples and Templates

Royalty-Free RTX RTOS Timer/ Interrupt Debug


Counter System Channel ULINK® USB Adapters
µVision Device • JTAG, SWD & SWV support
Database & IDE Flash
PWM
ROM • Debugging and Flash download
µVision Debugger • ARM device support
& Analysis Tools
UART RAM • Data Trace for Cortex-M3
Complete Device Simulation
CPU • ARM Instruction Trace (ULINKPro)
I2C/SPI
Real-Time • 8 and 16-bit support (ULINK2)
DMA Clock

RTOS and Middleware SD/MMC


Evaluation Boards
Ethernet USB CAN
Interface
Components
RTX RTOS Source Code
Examples and Templates

TCPnet Networking Suite

Flash File System RTOS and Middleware


• Easy implementation of complex applications
USB Device Interface • Royalty-free RTX Real-Time Kernel
• TCP/IP Suite with Server Applications
CAN Interface • File System for ROM and Memory Cards Keil provides a wide range of evaluation
• Direct Support for USB and CAN interfaces boards for 8, 16 and 32-bit devices

Keil PK51, PK166, Keil RL-ARM


and MDK-ARM Download the and ARTX-166
support more than 1,700 µVision4 Beta Version highly optimised, royalty-free
microcontrollers keil.com/uv4 middleware suites
keil.com/dd keil.com/rtos

keil.com
1-800-348-8051
programming
By Dan Saks
pointers
Allocating and deallocating arrays, in detail

L ast year and early this year, I wrote a couple of ar-


ticles on dynamic allocation in C and C++ em-
phasizing the distinction between allocating ob-
jects and allocating raw (uninitialized) storage.1,2 When ! Deallocating
array objects,

!
a program allocates an object, it not only allocates stor-
age for the object, but also initializes that storage with a not just
value appropriate for the type of object that will occupy
that storage. When a program just allocates storage, it
array storage,
leaves that storage uninitialized.
I followed those articles with one that explained the dis-
tinction between deallocating objects and deallocating stor-
age.3 When a program deallocates an object, it releases not
! requires a little
advance planning.
only the storage occupied by the object, but also any other Then a new-expression such as:
resources the object was using.
In each of those articles, I sketched out how C++ com- pw = new widget ();
pilers translate new-expressions into more primitive opera-
tions. I also showed how to write C code that emulates the translates more-or-less into something like:
behavior of new-expressions. However, I stalled out when I
got to array delete-expressions. I also left some details out pw = static_cast<widget *>
of the code for both the C++ implementation and the C (operator new(sizeof(widget)));
emulation of array new-expressions. This month, I’ll fill in pw->widget();
most of the missing pieces.
The first statement acquires storage for a widget object by
A RECAP calling operator new, and converts the address of that
New-expressions in C++ allocate objects. Each new-expres- storage from type void * to type widget *. The second
sion is conceptually, if not actually, a two-step process: (1) statement initializes the storage by applying widget’s de-
allocate storage for an object, and (2) initialize it. For ob- fault constructor. (That second statement—an explicit con-
jects of class types, initializing an object usually involves structor call—is not something you can actually write in
calling a constructor. C++.)
For example, suppose class widget is defined as: Delete-expressions in C++ deallocate objects. Each
delete-expression is also a two-step process: (1) release re-
class widget sources that the object was using, and (2) deallocate the
{ storage for the object. For objects of class types, releasing
public: resources involves calling a destructor.
widget(); // a constructor If pw is a pointer to an object of class type widget, a
~widget(); // a destructor delete-expression such as delete pw; translates more-or-
// ... less into something like:
};
if (pw != NULL)
{
pw->~widget();
Dan Saks is president of Saks & Associates,
a C/C++ training and consulting company. operator delete(pw);
For more information about Dan Saks, visit }
his website at www.dansaks.com. Dan also
welcomes your feedback: e-mail him at
dan@dansaks.com. A delete-expression applied to a null pointer does noth-

www.embedded.com | embedded systems design | SEPTEMBER 2009 11


programmng pointers
ing. If the pointer is non-null, the delete-expression applies Thereafter, you can construct a dynamically-allocated widg-
the destructor to the soon-to-be-deleted object, and then et with a default initial value using just:
deallocates the object’s storage by passing the object’s address
to operator delete. pw = new_widget();
In contrast to a new-expression, a call to the Standard C
malloc function, as in: which is a pretty good approximation for the C++ new-ex-
pression:
pw = (widget *)malloc(sizeof(widget));
pw = new widget;
merely allocates storage for a widget, leaving the storage
uninitialized. In contrast to a delete-expression, a call to the Similarly, you can mimic the behavior of a C++ delete-
Standard C free function, as in: expression by using another inline function:

free(pw); inline
void delete_widget(widget *pw)
merely deallocates the storage for a widget, without any re- {
gard for any additional resources that the widget may have if (pw != NULL)
been using. {
Although C doesn’t have classes with constructors and widget_destroy(pw);
destructors, you can emulate them by using structs and func- free(pw);
tions.4,5 For example, you can implement a C++ widget class }
as a C struct: }

typedef struct widget widget; Then, if pw points to a dynamically-allocated widget, you


struct widget can delete it by calling:
{
// widget data members go here delete_widget(pw);
};
which is a pretty fair approximation for the C++ delete-ex-
(The typedef immediately before the struct definition ele- pression:
vates the name widget from a mere tag to a full-fledged type
name.)6 delete pw;
You can also implement each widget class member func-
tion in C++ as a non-member function in C whose first pa- ALLOCATING AND DEALLOCATING ARRAYS
rameter is a pointer to the widget to be manipulated, possi- A C++ array new-expression as in:
bly along with other parameters. For example, you might
declare the C implementation of the widget default con- pw = new widget [10];
structor and destructor as:
allocates an array of 10 properly initialized widgets. As with
void widget_construct(widget *pw); other new-expressions, an array new-expression is still a two-
void widget_destroy(widget *pw); step process: (1) allocate storage, and (2) initialize it. Howev-
er, with an array new-expression the second step is a loop,
You can closely approximate the behavior of a C++ new- which applies the default widget constructor to each array
expression as an inline C function: element in ascending order by element address.
An array delete-expression such as:
inline
widget *new_widget() delete [] pw;
{
widget *pw = (widget *)malloc(sizeof(widget)); is also a two-step process: (1) apply the destructor to each ar-
if (pw != NULL) ray element, and (2) deallocate the storage for the array. In
widget_construct(pw); this case, it’s the first step that’s the loop—a loop that applies
return pw; the destructor to each array element in the reverse order.
} You can emulate the behavior of a C++ array new-ex-
pression as a C function:

12 SEPTEMBER 2009 | embedded systems design | www.embedded.com


widget *new_widget_array(size_t n) A common technique for passing the array dimension to
{ the delete-expression is for the array new-expression to stash
widget *pw = (widget *) the array dimension in a location just before the array itself, as
malloc(n * sizeof(widget)); illustrated in the Figure 1. Inasmuch as that additional loca-
if (pw != NULL) tion stores an array dimension, it should be declared as type
{ size_t, the same as the parameter to new_widget_array.
widget *p; Previously, new_widget_array allocated space for the
for (p = pw; p != pw + n; ++p) array using:
widget_construct(p);
} widget *pw = (widget *)malloc(n * sizeof(widget));
return pw;
} In the new version, it allocates space for the array plus stor-
age for the array dimension using:
Thereafter, you can dynamically allocate an array of properly
initialized widgets using just: size_t size = sizeof(size_t) + n * sizeof(widget);
size_t *ps = (size_t *)malloc(size);
pw = new_widget_array(n);
If ps is non-null (malloc returns a pointer to the re-
which is a pretty good approximation for the C++ new-ex- quested storage), then new_widget_array can proceed to
pression: place the array dimension at the beginning of that storage:

pw = new widget [n]; *ps = n;

You can mimic the behavior of an array delete-expression and then compute the address of the first element in the ar-
as a function declared as: ray itself:

void delete_widget_array(widget *pw); ++ps;

but the implementation isn’t as straightforward as it is for The allocated array is an array of widgets, but ps is a point-
new_widget_array. Whereas the array dimension appears er to a size_t, so new_widget_array needs a cast to obtain
explicitly in an array new-expression, it never appears in an a pointer it can use to access the array elements:
array delete-expression nor in the declaration of
delete_widget_array. Each array delete-expression speci- pw = (widget *)ps;
fies the address of (the initial element of) the array, but not
the array’s dimension. The delete-expression and Altogether, the new version of new_widget_array looks like:
delete_widget_array have to obtain the array dimension
another way. widget *new_widget_array(size_t n)
{
widget *pw = NULL;
Storage layout for a dynamically allocated array. size_t size = sizeof(size_t)
+ n * sizeof(widget);
The array new-expression places
the array dimension in this size_t size_t *ps = (size_t *)malloc(size);
location just before the first if (ps != NULL)
array element
{
widget
widget *p;
The array new-expression
returns the address of this *ps = n;
location as the address of pw = (widget *)++ps;
the initial (the 0th) array element widget
for (p = pw; p != pw + n; ++p)
widget_construct(p);
}
widget
return pw;
}
Figure 1

CONTINUED ON PAGE 38

www.embedded.com | embedded systems design | SEPTEMBER 2009 13


cover feature

Learn about the role of capacitance measurement algorithms


in multi-touch sensing user interfaces.

Getting in touch
with capacitance
sensor algorithms

I
BY JOHN CAREY

ncreasingly embedded applications must interact directly with


their environment and their end users. Consider the best new
touchscreen phones, in which the user interface is a large ca-
pacitive sensing screen that differentiates a flick from a tap and
tracks the motion of your finger but doesn’t track your ear.

Sensors are at the heart of these jected capacitive technology, and it’s
systems. They sense the environment used in the most advanced capacitive
and user behavior, enabling the prod- touchscreen solutions. Figure 1 shows
uct to respond in an intuitive but reli- and example of how a projected capaci-
able way. However, the sensor films tive touchscreen works.
themselves aren’t intelligent. They don’t This is not to say that the sensors
even collect data. They only sense. They themselves are not complex. On the
aren’t capable of differentiating be- contrary, a capacitive touchscreen sen-
tween useful and useless data or dis- sor consists of a large array of indium
criminating between the quality of dif- tin oxide (ITO) conductors on one or
ferent types of inputs. more layers of glass or polyethylene
Truth be told, these sensor films terephthalate (PET) plastic. Figure 2
hardly sense at all. They really just proj- presents an example of a touchscreen
ect an electric field created by an intel- sensor construction.
ligent capacitive sensing chip. This type The good optical clarity and low re-
of capacitive sensing is known as pro- sistivity of ITO make it the perfect con-

www.embedded.com | embedded systems design | SEPTEMBER 2009 15


ductor for creating a touchscreen. high SNR by using a pair of sensing
When the ITO sensor is connected to a electrodes for each capacitive channel.
capacitive sensing chip with a suitably One is a transmit electrode into which
high signal-to-noise ratio (SNR), it can a charge consisting of logic pulses is
accurately sense minute changes in ca- driven in burst mode. The receive elec-
pacitance. A finger’s presence for in- trode couples to the emitter via the
stance is on the order of a picoFarad overlying panel dielectric. When a fin-
(1012 Farads). ger touches the panel, the field coupling
It is typically ac- is reduced and
companied by back-
ground capacitances
of 10’s of nano- ! Charge transfer tech-
nology enables high
touch is detected.
Most charge
signal acquisition

!
Farads (109 Farads). techniques leave
This situation makes
SNR by using a pair of the charge lines
the sensing environ- sensing electrodes for hot (sensitive to
ment challenging the touch) during
and mandates an ex-
ceptionally high
SNR. Charge trans- ! each capacitive channel.
One is a transmit elec-
signal conversion.
The current on
the sensor edge
fer technology is well
suited to high SNR
capacitive sensing
systems. It allows the
capacitive system to
! trode, the other the
receive electrode.
wiring can be in-
cluded as part of
the position cal-
culation, intro-
ducing positional
sense minute changes in capacitance— inaccuracy to the measurement.
even from a finger as it approaches the The contribution of the edge
phone before it touches it or from the wiring increases with the length of the
touch of a fingernail. routing between the sensor and the
Charge transfer technology enables driver chip and becomes seriously

Projected capacitive touchscreen.

Field coupling

Dielectric
front panel

Drive Drive electrode Receive electrode


buffer
Collected charge

Drive pulses

Figure 1

16 SEPTEMBER 2009 | embedded systems design | www.embedded.com


ENCORE The Embedded Internet
PRESENTATION!
09.15.09 Conversation Continues

Join us September 15th for another chance to chat live, at our Don’t miss out on your chance to
chat with experts about Intel
FREE virtual trade show, with experts from all corners of technologies and innovation in
our industry. This is a great opportunity for you to get expert intelligent, connected devices.

advice and expand your resource network. If you are interested You’ll find:
• New videos and white papers
in embedded solutions, this is the show for you. • Two Intel executive keynotes
• Seven intriguing courses
• Unlimited networking opportunities

09.15.09
7 4
Learn more and register today at intelembeddedevent.com
Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries. ©2009 Intel Corporation. All rights reserved.
cover feature
problematic if the distance exceeds a entire row or column for capacitive In contrast, mutual capacitance
few centimeters. change. measurement uses an orthogonal ma-
The charge transfer technique Self capacitance works OK for sin- trix of transmit and receive electrodes
holds the receive lines at zero potential gle-touch systems, but with multi- arranged as an array of multiple small-
during the charge acquisition process touch systems there is no way to re- er touch nodes created by the geome-
and solves this problem, effectively re- solve the positional ambiguity that try of the electrode structure.
stricting the transfer of charges to results from more than one simultane- In a mutual capacitance based sys-
those between the transmitter X and ous touch on different parts of the tem, each touch is uniquely detected as
receiver Y electrodes at the point of in- screen. an XY coordinate pair, whereas in a
terest in the main sensor area. For example, if a user touches on self capacitance system, the detection
This “charge-transfer” signal-acqui- the capacitive grid at locations X1, Y1 of X and Y coordinates of a touch is
sition technique uses individual resis- and X2, Y2, the energized lines simply independent.
tive one-dimensional stripes to create a tell the chip that X1, X2, Y1, Y2 lines If two touches are present in a mu-
touchscreen. These stripes can be read tual capacitance system, this would be

!
either in parallel or sequentially, since detected as (X1,Y1) and (X2,Y2),
the connections to these stripes are in- Charge-transfer signal whereas in a self-capacitance system
dependent of one another. There is an acquisition, combined with it would be detected as
interpolated coupling between adja- (X1,X2,Y1,Y2), leaving two potential
cent lumped electrode elements and
an object such as a finger.
The charge-transfer technique re- ! a mutual-capacitance meas-
urement, provides a superi-
combinations of coordinates. The
self-capacitance ghosting problem is
exponential and becomes impossible

!
stricts signal acquisition to the imme- or SNR and better tolerance to solve as you transition to three or
diate vicinity where a row and col- more touches.
umn electrode couple to each other. to parasitic capacitance A mutual capacitive array is inter-
This localized coupling means that all preted as a complete touch surface
other parts of the row and column that maintains the ability to resolve
are largely not touch sensitive at the have all been touched. It doesn’t know multiple touch points within each in-
time the signal is acquired, literally en- the combination thereof. It could be dividual “small” screen. Because the
abling true, unlimited multi-touch ca- that the chip reports X1, Y2 and X2, Y1 capacitive coupling at each point in
pability.1 Figure 3 shows an example of were the touch locations. This problem the matrix can be measured independ-
a charge transfer. is known as ghosting. ently, it means that there is no ambi-
Another problem with self-capaci- guity in the reported coordinates for
MUTUAL VS. SELF CAPACITANCE tance touchscreens is the snapping ef- multiple touches. It is then technically
There are two approaches to determin- fect. It happens when tracking two possible to have unlimited touch
ing finger position with a projected ca- touches moving towards a shared row recognition. Figure 4 compares mutu-
pacitive touchscreen: measuring self or column electrode; the reported co- al vs self capacitance.
capacitance and measuring mutual ca- ordinates tend to “snap” to that elec- Charge-transfer signal acquisition,
pacitance. Touchscreen solutions that trode causing a strong nonlinearity combined with a mutual-capacitance
measure self capacitance measure an and poor user feel. measurement technique, provides a
superior SNR and better tolerance to
Touchscreen sensor construction. parasitic capacitance, allowing weak
signals, such as capacitance conducted
through a fingernail, coin or stylus, to
be processed.
Layer1 ITO (Y)
SENSOR RESOLUTION
εr_panel Sensor resolution, the ability to a re-
Tpanel
εr_adhesive solve passive conductive stylus, can be
Tadhesive
εr_substrate Tsubstrate
directly linked to the electrode pattern
εr_adhesive Tadhesive
or ITO sensor design. A high-resolution
εr_substrate pattern can be formed by having an ar-
ray of vertical transmit bars separated
Layer2 ITO (X) by a dielectric to a second layer, which
Figure 2

18 SEPTEMBER 2009 | embedded systems design | www.embedded.com


cover feature
contains a horizontal array of receiving Charge transfer.
lines.
Drive
In each location the bars cross, a electrode
parallel plate capacitor or sensing elec- X
Drive pulses X Drive buffer
trode is formed.2 In order to maximize
the resolution and SNR of this pattern, Cross-coupling
capacitance
to detect a 2-mm passive stylus tip for
instance, it is important to optimize
the density of electrodes. Adding more Y
Receive 1
capacitive channels in each axis for a electrode
Vss
Sample capacitor
given screen size can have a beneficial Transfer/reset switches
effect even though the sensor is more
1
complex to manufacture. Vss
Timer
More channels will result in a capture
higher SNR. The optimum row and register
Slope resistor Voltage
column pitch of the electrodes should
comparator
approximate the tip-to-tip distance be- Vss

tween thumb and forefinger when


Slope drive
pinched together, divided by two
(about 5 mm or less). This means that
a 4.3-inch screen in a 16:9 aspect ratio
Figure 3
should ideally have 19 rows by 11
columns, totaling 209 mutual capaci-
tance electrodes.3 bility, it also introduces the problem of and applying appropriate algorithms
Increasing the electrode density selectivity. The entire surface of the to allow for qualitative differentiation.
also allows a more qualitative interpre- touchscreen measures any small
tation of the data. For example, a 200+ change in capacitance resulting from MULTI-TOUCH:
channel matrix makes it possible to any charged object (finger, ear, face) HOW MANY IS TOO MANY?
process the “size” and “shape” of the that is even near the surface of the sen- Two touches allow objects to be
touch, allowing the end user to draw a sor, whether intentional or uninten- stretched, squeezed, and rotated. One
picture or execute a signature. With a tional. might wonder what is the utility of pro-
sufficiently high refresh rate (200 Hz), The challenge is to collect the data, cessing five or 10 touches simultane-
the technology can even support full- discard useless data and utilize useful ously, when you can barely fit three fin-
speed signature and handwriting data in a selective and accurate way. gers on the phone?
recognition with as small as a 2-mm Introducing selectivity and accuracy The answer goes back to the notion
stylus. involves arranging and measuring the of making the sensor selective—able to
While high sensitivity and resolu- change in capacitance in a meaningful interpret the quality or size of a touch
tion vastly improves the screen’s flexi- way, while also acquiring enough data and to suppress accidental on unwant-

Mutual vs. self capacitance.


Self capacitance Mutual capacitance
Result: Result:
X3 & X0 = 0 X2 * Y0 = 1
X2 & X1 = 1 X1 * Y3 = 1
X3 X3
Y0 & Y3 = 1
X2 Y1 & Y2 = 0 X2
Conclusion: Conclusion:
X1 X2,Y0 = 1 X1 X2,Y0 = 1
X2,Y3 = 1 X1,Y3 = 1
X0 X1,Y0 = 1 X0
X1,Y3 = 1
Y0 Y1 Y2 Y3 4 + 4 = 8 sensors Y0 Y1 Y2 Y3 4 * 4 = 16 sensors

Figure 4

www.embedded.com | embedded systems design | SEPTEMBER 2009 19


MEN Micro’s
ESMexpress® COMs:
Power Your Performance Drift compensation.
with Intel ® Atom™ up to
Signal
Dual Core
Hysteresis

Threshold
Reference

Output

Figure 5

ed inputs. A true multi-touch technolo- quite tricky, as it isn’t as simple as just


gy allows intended touches to be identi- ignoring presses at the edges.
fied and interpreted (such as flick vs. In actuality, the chip has to be
tap). It also allows unintended touches smart enough to determine the touches
to be identified and rejected. from a grip at the edges, track them as
On its own, a capacitive touchscreen they move, and ensure that they don’t
sensor has no idea what is touching it or trigger false presses. All the while the
why. It cannot distinguish between a fin- chip needs to keep the whole screen ac-
ger, ear, face, elbow, or butterfly. Thus, tive for multi-touch support, even as
Higher Performance…XM1 it’s possible for the end user to issue ac- the correct touches move to the edges
with Intel® AtomTM up to
cidental com- where the grip oc-

!
1.6 GHz and 1 GB DDR2 SDRAM,
-40°C to +85°C – extreme reliability mands to the On its own, a capacitive curs.
for rugged and safety-critical phone by just Gesture pro-
applications. using it: grip- touchscreen sensor has no cessing algorithms

!
ping its edges or that calculate and
Highest Performance…XM2
pressing it to
idea what is touching it or interpret the XY
with Intel® CoreTM2 Duo up to
2.26 GHz and 4 GB DDR3 SDRAM, his/her ear or why. It cannot distinguish coordinates of a
face. stream of physical-

!
0 to +60°C – powerful graphics and
computing for demanding tasks. Some of the between a finger, ear, face, ly present data
“extra” touch from each of the
Robust ESMexpress® Standard
points in a true
elbow, or butterfly. 10 unique touch
EMC-proof housing for conduction
and convection cooling, shock and multi-touch so- points to execute
vibration resistant connectors and lution can be assigned to unintended gesture commands such as tap, drag,
the small 95 x 125 mm format touches. Suppressed touches must be drop, zoom, rotate, or flick—based on
(ANSI-VITA 59 in process).
tracked and stay suppressed even if the speed of the gesture and the XY po-
Rely on MEN for rugged computer
they stray into the active region. sitions in the data stream.
boards and systems in harsh, mobile This means the controller must be Another potential use of the under-
and mission-critical environments. able to uniquely and unambiguously lying many-touch data is to recognize
resolve, classify, and track many touch- shapes on the touch surface. This al-
es at once. This enables a user to com- lows all kinds of potentially useful in-
fortably hold a small product with terface enhancements. Basic shape
some amount of finger/screen overlap, recognition for a nose, cheek, or even
while also allowing the touchscreen to an ear allows further suppression of
operate normally. real-world situations that would other-
Face and grip suppression algo- wise falsely trigger the touchscreen.
MEN Micro, Inc.
24 North Main Street rithms can be used that identify and re- As more touches can be uniquely
Ambler, PA 19002 ject unintentional input from the user’s identified and reported to the host, the
Tel: 215.542.9575
E-mail: sales@menmicro.com
face or ear or from fingers gripping the applications will start making use of
www.ESM-express.com edges of the phone. Grip suppression is multiple touch data.

20 SEPTEMBER 2009 | embedded systems design | www.embedded.com


cover feature
NOISE AND SYSTEM ISSUES
As noted earlier, capacitive touchscreen
controllers measure very small changes ! Exploiting these
exceptionally sensitive
struction, fast performance, and inno-
vative algorithms, embodied in highly
intelligent, capacitive touchscreen con-

!
in the row to column coupling capaci- troller chips that can deliver high chan-
tance. The way the controller performs
sensors requires nel density, a high SNR, and a multi-
the measurement has a strong influence innovative algorithms . . . touch capability. ■
on the susceptibility of the controller to
external noise. John Carey, director of Marketing Touch
Technology at Atmel Corp., has a masters in electrical engi-
One common noise generator encountered with touch- neering from California State University, as well as a bachelors
screens is the LCD itself. It often has voltage transients meas- in electrical engineering from Arizona State University. You
ured as several volts with rise/fall times measured in mi- may reach him at John.Carey@atmel.com.
croseconds. Using the right type of capacitive-to-digital
conversion and noise suppression algorithms, it’s possible to ENDNOTES:
reject most of the noise at source. 1. “Charge Transfer Capacitive Position Sensor.” U.S. Pat. No. 7,148,704,
Another approach is to use a sensor electrode pattern December 12, 2006.
that uses two ITO layers but is self shielding from behind. 2. Capacative Position Sensor United States Patent Pending
This approach saves cost by eliminating the need for an extra 20080278178 , November 13, 2008.
shield layer of ITO while still renderings the sensor immune 3. Hybrid Capacitive Screen Element, United States Patent Pending
to the noisy LCD surface. 20070247443 October 25, 2007.
The second most problematic noise source is found with 4. Hybrid Capacitive Screen Element, Patent Pending 20070247443
“floating” power supplies that often capacitively couple several October 25, 200.7
hundred volts of distorted 50/60-Hz waveform relative to 5. Capacitive Position Sensor. U.S. Pat. No. 6,288,707, September 11,
earth, into the entire touchscreen device. When a user touches 2001.
the device, the sensor effectively becomes part of a capacitive 6. Adjacent Key Suppression—U.S Patent 6,993,607, January 31, 2006.
voltage divider, contaminating the measurements with huge
amounts of low-frequency noise. Again, with clever chip design
and noise suppression algorithms can eliminate this effect.

CALIBRATING TOUCH
Although many touchscreen solutions require the factory
and/or the end user to calibrate them before use, solutions are
available with self-calibration algorithms that allow the chip to
operate independently of any user or manufacturing calibration.
Signal drift can occur because of changes in the un-
known electrode capacitance and Cs sampling capacitors
over time, often resulting from changes in temperature or
humidity, and causing false detections, non-detections, and
sensitivity shifts.
Drift compensation algorithms compensate using a
slew-rate limited change to the reference level; the threshold
and hysteresis values are slaved to this reference. Once an
object is sensed, the drift compensation mechanism ceases
since the signal is legitimately high, and therefore should not
cause the reference level to change.4,5,6 Figure 5 illustrates
drift compensation.

ITO SENSORS ARE THE HEART


Capacitive touchscreens will likely make their way into prod-
ucts not even imagined today. Consumers are demanding
that user interfaces interact with them in an intuitive, effort-
less, and reliable way.
ITO sensors are at the heart of the solution. Exploiting
these exceptionally sensitive sensors requires thoughtful con-

www.embedded.com | embedded systems design | SEPTEMBER 2009 21


feature

Complex multicore processors are upon us. With the right open-source tools and
a commercial RTOS, they don’t have to be monsters to program.

Gaming the system—


high-end networking
on the Cell processor

T
BY JIM TURLEY

urn the clock back 10 years. Windows Vista was still on the
horizon; Intel’s processors were single-core Pentium designs;
Chrysler and General Motors were still going concerns; and
Google was just a moderately popular search engine. It was
also the time that hints of a rumored “super chip” first began

swirling around the technology press. 2006, www.embedded.com/188103194)


Could Sony, IBM, and Toshiba really be and elsewhere and decided it was one
collaborating on a massively parallel of the most ambitious chips I’d ever
supercomputer microprocessor? Was it seen. Since then, it’s only gotten more
for real? What would it be used for? interesting.
Over the intervening years, I covered Quick—what do video games and
Cell (as it came to be called) in the advanced network routers have in com-
pages of Embedded Systems Design (“A mon?
glimpse inside the Cell processor,” June Quite a lot, as it happens. Both

22 SEPTEMBER 2009 | embedded systems design | www.embedded.com


!
need to shovel large amounts of data flashy but equally demanding embed-
from Point A to Point B while keeping ded systems.
No mere coprocessors,
up with real-time performance dead- each SPE is a serious

!
lines. Drop a data packet and you’re A NINE-HEADED BEAST
dead—just like first-person shooters. In At first blush, Cell looks like a big Pow- high-end CPU in its own
networks, there are no do-overs. erPC chip. Coming in part from IBM,
Both types of systems also consume that’s understandable. But the 3.2-GHz
right, on par with the
high-end processors and high-speed
memories at a surprising rate. While PC
performance has leveled off, game con-
soles and network boxes are still shoot-
ing up the performance charts. PC users
PowerPC processor core is the least in-
teresting part of Cell, and accounts for
barely 10% of the chip’s silicon and a
fraction of its massive computing pow-
er. In contrast, half the chip is dedicated
! PowerPC “master”
processor.

may swoon over dual-core processors to eight identical and oddly named have their own SPE instruction set
and multitasking operating systems, but Synergistic Processor Elements, or SPEs that’s unique to Cell. Each SPE is a
any kid with a PlayStation 3 can beat (see Figure 1). massive 128-bit single-instruction,
that with a 9-processor smackdown. No mere coprocessors these, each multiple-data (SIMD) vector-process-
Bring it on, Poindexter. SPE is a serious high-end CPU in its ing machine, meaning it executes one
That’s why video games and net- own right, on par with the PowerPC instruction at a time but can “broad-
work systems both lead the way in in- “master” processor. The SPEs don’t exe- side” multiple math operations across
teresting combinations of killer hard- cute the PowerPC instruction set; they wide data values. This is typical of vec-
ware and real-time software. Their
demand for performance is essentially
infinite: if you build a faster box, it will Block diagram of Cell Broadband Engine chip. The central 3.2-GHz
find a ready market. And the volumes PowerPC processor core is surrounded by eight identical SPE
are large enough to justify serious en- (synergistic processor element) vector processors, which provide
gineering expense and software-devel- most of the chip’s power.
opment time. That means developers
in both of these markets need to make
their systems (a) faster than last year’s
box; (b) cheap enough to build in high SPE1 SPE3 SPE5 SPE7 I/O
XDR MIC
volume; and (c) reliable enough that Mem
you don’t lose money on yield, recalls LS LS LS LS
XIO (265 KB) (265 KB) (265 KB) (265 KB)
or repair. In short, it’s just like every 25.6 GB/s
Memory
I/O
interface
other embedded-system development, controller
Flex
IO1
but with higher stakes. And cooler DMA DMA DMA DMA
prototypes.
This price/performance equation
also explains why IBM, Sony, and
Toshiba each spent hundreds of mil- Total
EIB is 4 ring buses up to 96B per clock, 2 in each direction
76.8 GB/s
lions of dollars and collaborated for
years to create Cell: a wickedly fast and
astonishingly complex multiprocessor
chip for . . . video games. I/O
PPE SPE0 SPE2 SPE4 SPE6 Flex
Actually, Cell is for more than just IO0
games. Its creators wouldn’t have spent L1 (32 KB I/D)
LS LS LS LS
all that time and money concocting a (265 KB) (265 KB) (265 KB) (265 KB)
chip that only Sony could use. Instead,
Cell was devised to handle all sorts of L2 (512 KB)

data-intensive broadband media, in- DMA DMA DMA DMA

cluding network packets, video streams,


and massive floating-point calculations.
Figure 1
The heart of the PlayStation 3 is also
becoming the heart of a number of less
www.embedded.com | embedded systems design | SEPTEMBER 2009 23
TS-7500
Embedded Computer
Faster. Smaller. Cheaper. tor and media processors, which often the same software as each other. With
have to iteratively add, multiply, or ro- nine fully capable processors, an ambi-
tate several integer or floating-point tious (or masochistic) programmer
Qu. 100 84 values. Rather than execute eight sepa- could conceivably partition the Cell
rate floating-point instructions in a processor into nine different software
row, for example, an SPE can execute a environments.
74.3 mm / 2.925 in.

single eight-wide operation on all the As fun as that sounds, layering on


values at once. the heavy-duty software may not always
Each SPE also has its own 256K be the best solution. In embedded ap-
block of private RAM for executing plications, agility and elegance are usu-
code or storing local data. This is part ally preferable over fully optioned-out
bonus and part necessity; with eight alternatives. Every line of code burns
high-speed processors all running at CPU cycles, so less (code) is more (per-
once (not to mention the ninth Power- formance).
66 mm / 2.600 in. PC processor), there’s no good way to In at least one case, Sony engineers
feed them all came to the same con-

!
from shared clusion. In a paper
Powered by a
memory. With nine fully capable presented at an IEEE
250 MHz ARM9 CPU Instead, each conference on inter-
SPE fetches and
processors, an ambi- connects, a team of

!
Low power, fanless, < 2 watts
64MB DDR-RAM
executes code tious (or masochistic) Sony programmers
from its own local described an interest-
4MB NOR Flash
memory, staying programmer could ing embedded use of

!
Micro-SD Card slot - SDHC
off the internal the Cell processor.1 It
USB 2.0 480Mbit/s host (2) slave (1)
and external bus-
conceivably partition seems the company
10/100 Ethernet es as much as the Cell processor into wanted to add high-

!
Boots Linux in less than 3 seconds possible. Each speed Ethernet to one
Customizable FPGA - 5K LUT SPE can fill or nine different software of its upcoming con-
Power-over-Ethernet ready purge its local sumer-electronics
Optional battery backed RTC store to/from off-
environments. products.
Watchdog Timer chip memory The developers
8 TTL UART anytime it wants to. As long as each wanted to get as close to 10 Gbit/sec
33 DIO, SPI, I 2 C SPE’s program fits into 256K of code “wire speed” as possible, while still us-
and data space, life is good—but there- ing a TCP/IP stack for compatibility
Dev Kit provides out-of-box in lies the challenge. and overall ease of development.
development + extra features They’d initially planned to run Linux
IT’S FREE, BUT IT’S NOT CHEAP on Cell’s central PowerPC processor for
Over 20 years in business If you’re a numbers geek, Cell’s per- all the usual reasons: Linux is royalty-
Never discontinued a product formance numbers are impressive. free, it’s customizable, it supports net-
Engineers on Tech Support Adding it all up, the chip has a theoreti- working, and it’s already been ported to
Open Source Vision
cal peak computational arsenal of just PowerPC. PowerPC chips are common
over 230 billion floating-point opera- in networking so the whole process
Custom configurations and designs w/
excellent pricing and turn-around time tions per second. That’s mainframe or seemed pretty straightforward.
Most products ship next day supercomputer territory. Indeed, IBM’s Then the project hit its head on the
first uses of the chip were in its System low doorway of reality. First off, the
z9 mainframes and the $133 million Sony team discovered that the PowerPC
Roadrunner supercomputer. wasn’t fast enough to handle 10-Gbps
Technologic
S Y S T E M S
Surely a chip this awesome can run Ethernet traffic after all. At least, not
a heavyweight operating system? You while running Linux. So they decided
We use our stuff.
betcha—in fact, it can run several. to shift the networking stack onto one
visit our TS-7800 powered website at
There’s no rule that says the PowerPC of the chip’s eight SPEs. Clearly an SPE
www.embeddedARM.com
needs to run the same RTOS as the would be up to the task. If one SPE can
(480) 837-5200
SPEs, or that the SPEs all need to run crunch a billion floating-point opera-

24 SEPTEMBER 2009 | embedded systems design | www.embedded.com


feature
tions per second, surely it can handle a In fact, the network stack and the already braced themselves for the task
TCP/IP stack. RTOS together used just 80K of memo- of porting the Linux network stack.
Actually, no. And therein lay the ry, less than a quarter of what Linux This would be a breeze by comparison.
second problem. It so happens that Lin- would have required. Granted, NetX Both Linux and NetX were available as
ux’s TCP/IP stack is much too big to fit doesn’t support all the elaborate net- C source, so a quick recompile got
in an SPE’s 256K local store. That’s a work functions that a Linux stack them most of the way there. For the
real setback when you’re designing for time being, Sony chose not to take ad-

!
maximum performance at minimum vantage of the SPE’s SIMD architecture,
cost. Performance would suffer because
Instead, the Sony team sticking with the standard scalar code.
the Linux code (running on the Power- abandoned Linux in favor After some performance tweaking (de-
PC) would need to load and unload tailed below), they came away quite
portions of the driver, performing
processor-mediated DMA transfers like
an awkward software cache. And cost ! of a royalty-free commer-
cial real-time operating
happy with the results.
The Linux distribution for Cell
treats the SPEs as virtualized resources,
would suffer because Sony’s engineers
would have to include more off-chip
memory to hold all that code.
Instead, the Sony team abandoned
Linux in favor of a royalty-free commer-
! system and TCP/IP stack
running on the SPE.
which means you can create more SPE
threads than there are SPEs. That’s a
nice feature, but it also means an SPE
thread might be swapped out while it’s
running. Such is the nature of context
cial real-time operating system and would, but for most embedded systems, switching. To prevent that from hap-
TCP/IP stack running on the SPE. that’s exactly the right tradeoff. The pening, the Sony team “pinned” the net-
Specifically, they chose ThreadX and its overall software structure is illustrated work stack to one SPE, effectively pro-
companion networking layer, NetX, in Figure 2. hibiting Linux from swapping it out and
both from Express Logic in San Diego. The only downside was porting. dedicating that SPE exclusively to net-
Like Linux, ThreadX is royalty-free but NetX wasn’t available for the SPE in- work processing.
unlike Linux, it fits easily within Cell’s struction set – but then again, neither Given the high packet rates Sony
256K memory constraint. was anything else. Sony’s engineers had was hoping for, frequent interrupts

Sony’s engineers decided to run Linux on the PowerPC processor core and use the ThreadX RTOS on the SPE
to handle networking. The combination of the two operating systems and processor architectures provided
better performance, and used less memory, than using a single operating system.

User space

Kernel space Linux user process


application
(on SPE or PPE)

Ring buffers

Service application program Communication


Library or SPENET
(on SPE or PPE)

NetX TCP/P protocol stack XDR Memory

Communicate
Allocate via device file
RC-101 10-Gigabit Ethernet driver

SPENET kernel module


ThreadX real-time OS
Allocate

SPE side components PPE side components

Figure 2

www.embedded.com | embedded systems design | SEPTEMBER 2009 25


feature
turned from being a necessity to being one of the oldest and simplest of driver head, often through the use of zero-
a problem. In their experience, most methods. copy buffers. Larger stacks might have
network stacks are interrupt-driven, es- trouble squeezing into limited memory

!
pecially from the hardware interface space, as was the case here.
when it needs servicing. As data rates Given the high packet There’s never been a case where
climb, these interrupts (and their atten- rates Sony was hoping more code makes a processor run
dant context switching) become so fre- faster. Like racecars, high-performance
quent that the overhead overwhelms
the actual task. The faster it works, the
slower it goes. ! for, frequent interrupts
turned from being a
programs are tuned for light weight
and efficiency, with no extraneous fea-
tures. “Simplify and add lightness” is a

!
To fix this, the team decided to necessity to being a common phrase among racecar design-
switch from an interrupt-driven to a ers, and a similar philosophy prevails
software-polled design. They kept the problem. among embedded designers. “It can’t
same hardware components but just break if it’s not on the car,” might be
tweaked the network driver to poll the another useful axiom.
chip at regular timer-tick intervals. The WHEN LESS IS MORE Although richly appointed, fully
resulting efficiency was dramatic. Be- Sony was optimizing for speed, was featured network stacks are fine when
cause the SPE is dedicated to network- limited by hardware constraints, and you’re designing for “big iron,” they
ing and doesn’t have a plethora of sys- needed only a certain well-defined fea- only make sense if you need the full
tem-level functions to look after, it can ture set. Sound like a typical embedded repertoire of network operations. Oth-
focus on the task at hand without fear application? In situations like this, a erwise, all those other features are dead
of being interrupted and missing a smaller software TCP/IP stack with a weight. Most TCP/IP stacks assume the
packet. So paradoxically, this modern reduced feature set can process packets presence of, and rely on, an operating
high-speed network interface relies on and decode headers with minimal over- system scheduler to manage threads for
packet processing, buffer management,
and communication with the host
processor. That’s only natural, because

Your solution
network management is generally an
add-on to some existing operating sys-
tem. Whether it’s Unix, Linux, Vx-
Works, Windows, or whatever, the op-

is here. erating system came first and the


networking features were grafted onto
it. That wasn’t really the case here, even
though Sony was using Linux in the
Save time – and money – with embedded
system (and for that matter, on the
software solutions built to run right out of
same chip).
the box. Get development started quickly,
In SoC or multicore architectures,
with no integration required and full support
small and fast is usually the way to go.
for popular tools. With Micro Digital you have
Sometimes that’s by necessity; some-
low-cost, no-royalty licensing, full source
times it’s by design. Multicore proces-
code, and direct programmer support. So
sors, in particular, often can’t afford
get your project off to a great start. Visit us
the overhead of simultaneous accesses
at www.smxrtos.com today.
to shared memory. The common
Free Evaluation Kits: www.smxrtos.com/eval memory is likely on a shared bus that’s
Free Demos: www.smxrtos.com/demo slower than local resources, and there’s
probably arbitration overhead among
multiple requests as well. A designer’s
best bet is to use the common memory
RTOS INNOVATORS
800.366.2491 sales@smxrtos.com only infrequently, and rely on local
memory for repetitive use in tightly
www.smxrtos.com coded algorithms that execute in mini-
ARM s ColdFire s Cortex s PowerPC s x86 s CodeWarrior s CrossWorks s GCC s IAR EWARM mum cycles.

26 SEPTEMBER 2009 | embedded systems design | www.embedded.com


feature
In Sony’s case, they Together, those two
chose to run Linux on the changes improved perform-
main PowerPC processor, ance by 13% and 22%, re-
so whatever ran on the spectively. Tweaking branch
SPEs would have to com- prediction (a trivial compil-
municate with Linux – er switch) gained them an-
but it didn’t have to be other 22% improvement in
Linux. In such situations, speed.
a smaller and simpler net- When all was said and
work stack running on an done, Sony saw sustained
SPE can use very simple transfer rates of 8.5 Gbit/sec

Photo: Cell processor, courtesy of IBM.


IPC mechanisms to keep for TCP packets, and even
the control processor (the faster (close to wire speed)
PowerPC) in sync with the for the simpler UDP pack-
packet processing. So, ets. Impressively, this was all
there are two issues: oper- done with off-the-shelf,
ating system services to scalar, 32-bit commercial
support the stack, and IPC code that was never de-
services to communicate signed for the Cell proces-
with whatever operating sor. Looking forward to tak-
system is running on the ing advantage of Cell’s 128-bit
control processor (Linux SIMD operations, the Sony

!
in Sony’s case) team feels there’s a lot more perform-
As a side note, the same minimalist
As frightening as Cell ance and code efficiency still to be
philosophy also applies to graphics. Al- might appear, it’s actual- found in this combination of hardware

!
though Sony isn’t saying much about and software.
it, it’s safe to assume that this particu- ly an easy beast to tame. So as frightening as Cell might ap-
lar project includes some interesting pear, it’s actually an easy beast to tame.
graphics with some sophisticated num-
. . . Cell is proving itself Between its open-source development
ber-crunching. Cell is the ideal chip for
this, after all. But this tends to rule out
the “traditional” Linux approach, for
all the same reasons we just explored.
One or more SPEs running specialized
! as an unusually accom-
modating platform.
tools, commercial and open-source
RTOS options, and fearsome hardware
resources, Cell is proving itself as an
unusually accommodating platform for
networking, encryption, media pro-
code will probably do a far better job with unknown, non-network tasks. You cessing, gaming, floating-point acceler-
than dumping the task on the central can ignore interrupts, turn off task pre- ation, and even supercomputing. Com-
PowerPC processor. More and more, it emption, and eliminate mutexes and plex multicore processors are upon us;
looks like the right way to exploit Cell’s other exclusion processing. All of which it’s good to see that they don’t have to
massive processing power is to divide the Sony team did. be monsters to program. ■
and conquer, not to throw everything They also aligned the TCP/IP stack’s
at the chip and let it power its way data structures. Like most RISC proces- Jim Turley is an acknowledged authority
on microprocessor chips, embedded sys-
though. sors, the SPE has no instructions for ac- tems, semiconductors, and intellectual
cessing misaligned data; it assumes all property licensing. He is the author of
TWEAKING FOR FUN AND PROFIT operands are aligned on natural bound- seven books, was past editor of the Mi-
Once Sony’s engineers decided to dele- aries, which in this case is 128 bits. Ac- croprocessor Report, was editor in chief
of Embedded Systems Design magazine,
gate all the networking functions to a cessing misaligned operands requires and is currently publisher of Silicon Insid-
single SPE (as opposed to running multiple instructions, so code size can er. Contact him at http://jimturley.com.
some or all of it on the parent PowerPC bloat pretty quickly when you’re access-
processor), they could make some other ing lots of misaligned data structures— Endnotes:
1. “Network Processing on an SPE Core in
optimizations, too. For example, once exactly like the network stack would. A
Cell Broadband Engine,” 16th IEEE Sym-
you know your processor will handle quick application of the gcc alignment
posium on High Performance Intercon-
network traffic and only network traf- attributes shrunk the code size by 21%, nects, 2008; www.hoti.org/archive/2008pa-
fic, you no longer have to cooperate at a slight cost of data size. pers/2008_S4_3.pdf

www.embedded.com | embedded systems design | SEPTEMBER 2009 27


feature

A brief “no-bull” tutorial on how virtualization actually works and where it is most useful.

Is virtualization right
for your application?

A
BY CASEY WELTZIN

ny electrical or computer engineer that reads industry publica-


tions today has likely seen the word “virtualization” enough
times to know that it’s a hot topic. In fact, a quick search of
embedded.com yields over 200 results for the buzz word.
How real is this trend? Does virtualization provide real sav-
ings? This article will attempt to answer these questions by
outlining the major engineering use virtualization hardware and software.
cases for virtualization, discussing how The basic goal of virtualization is to
it impacts performance, and addressing run multiple operating systems in par-
a topic that is critical in engineering de- allel on one computer such that no in-
signs: hardware I/O. The goal is to allow dividual operating system affects the
you to make a straightforward assess- others in any way. In simple terms, any
ment of virtualization technology for individual operating system (called a
your application—independent of the virtual machine) cannot be allowed to
hype. affect shared system resources except in
To fully understand the perform- very special circumstances. For exam-
ance implications of using virtualiza- ple, envision several operating systems
tion in your own designs, it helps to inadvertently accessing the same mem-
know the basic principles that make ory location at the same time. This sit-

28 SEPTEMBER 2009 | embedded systems design | www.embedded.com


feature
uation would be a nightmare to debug, practice, there are three different ways be saved in a memory structure. Like-
as any operating system could over- of making sure that this happens: wise, this memory must be restored
write the stored value at any time. when control is transitioned back to the
The key component required to
make virtualization work is a piece of
• Binary translation—Software is
used to translate pieces of code on
operating system. This can be a very
time-consuming operation, and there-
software called a virtual machine moni- the fly and call the VMM when fore, it’s advantageous to avoid VMM
tor (VMM), also known as a hypervisor. necessary. intervention at all costs.
See Figure 1 for a conceptual represen-
tation of how this software fits into a
• Hardware assist—Virtualization
features built into a processor (such
In practice, VMM intervention can
be kept to a minimum with some clever
virtualized system. The VMM’s mission as Intel-VT or AMD-V) are used to design techniques. First of all, by using a
is straightforward: prevent individual call the VMM automatically when multicore processor in your virtualized
operating systems (virtual machines or an operating system attempts to ex- system and dedicating individual
VMs) from altering shared system state, ecute privileged instructions. processor cores to individual operating
therefore making sure that conflicts do
not occur. In more exact terms, a
• Paravirtualization—Operating sys- systems, you can avoid using the VMM
for operating system scheduling.

!
VMM cannot allow individual VMs to Instead, each operating system can
independently execute “privileged in- The basic goal of virtualization run independently as long as priv-
structions” such as accessing memory is to run multiple operating ileged instructions are not execut-
or I/O devices that could potentially ed. Next, by partitioning memory
conflict with other VMs accessing the
same resource.
Note that there are two basic cate- ! systems in parallel on one
computer such that no
between operating systems and
configuring individual operating
systems to access a certain block of

! individual operating system


gories of VMM software: hosted and physical memory, you can reduce
bare-metal. Because hosted VMM so- or eliminate address translation
lutions (such as VMWare Worksta- affects the others in any way. overhead. Finally, by assigning I/O
tion) rely on a host operating system devices to individual operating
for scheduling and I/O access, they are systems and carefully routing in-
generally not a good fit for deployed tem code is explicitly modified (if terrupts, you can avoid invoking the
engineering applications. Therefore, we source code is available) to call the VMM upon each I/O access or inter-
will focus on bare-metal VMM software VMM. rupt. The trend here is clear: in virtual-
for the remainder of this piece. ized systems, partition where you can
To effectively manage shared system Regardless of which of the methods and share only when you must to guar-
resources, VMM software must be above is used to call the VMM, the per- antee the best performance.
called whenever an operating system formance implications of virtualization To summarize, incorporating virtu-
wishes to execute a privileged instruc- can be compared to context switches in alization into an engineering design by
tion. For instance, the VMM should in- multithreaded applications. Every time using VMM software has the potential
tervene when a particular operating that an operating system must call vir- of adding some overhead to your appli-
system attempts to write to an Ethernet tual machine monitor software, the cation that can reduce performance.
port that is shared by several OSes. In state of that operating system needs to The exact amount of this overhead will

Bare-metal virtualization software enables Sharing I/O devices between operating systems
engineers to run multiple operating systems requires a VMM driver and can impact performance.
in parallel on one set of computing hardware.
General-purpose OS Real-time OS

OS 1 OS 2 OS n

VMM software

Virtualization software (VMM) VMM driver

Shared Ethernet device


CPUs I/O Memory

Figure 2
Figure 1

www.embedded.com | embedded systems design | SEPTEMBER 2009 29


feature
depend on how many times the VMM
BACKGROUND AND BENEFITS OF VIRTUALIZATION software must be called to abstract
Here’s a brief review of what the term virtualization means and how the shared system resources from individual
technology has evolved. Many engineers (and IT professionals) improve virtual machines. For designers that
time to market by using operating systems in their applications. Operat- must share system components between
ing systems provide a large set of capabilities and programs that a de- operating systems, a number of tech-
signer can leverage without having to “reinvent the wheel.” Because nologies exist that attempt to minimize
each operating system is specialized with a focus on certain capabili- the performance hit. When possible, the
ties, many applications use multiple operating systems to gain access to best strategy is to avoid VMM interven-
more features. In the engineering domain, different operating systems tion altogether using partitioning.
provide features like deterministic performance (in the case of a real-time It should also be noted that virtual-
operating system) or a rich user interface (in the case of a general pur- izing two operating systems to run on a
pose operating system like Linux). The bottom line is that many complex single computer (as opposed to using
engineering applications today use a multitude of operating systems to two or more computers) inherently
get the job done. means that only a fraction of the overall
However, until recently, multiple operating systems meant multiple processing power may be present in the
computers. Multiple sets of computing hardware were required to sup- system. Virtualization technology is
port the use of multiple operating systems at the same time. While in meant to use computer hardware, in-
some applications the extra hardware required to run multiple operating cluding processor resources, in an effi-
systems is warranted, a large number of multi-OS systems contain redun- cient way. It doesn’t mean that a given
dant hardware and underutilized processors, often resulting in inefficien- application will necessarily be able to
cy, added cost, and increased physical footprint. run with a fraction of the computing
The inefficiency in multi-OS server farms in the IT world has motivat- cycles.
ed a shift toward new technology. Virtualization is a broad term that
means abstraction of computer resources, but in practice it refers to a I/O ACCESS IN VIRTUALIZED
combination of software and hardware that enables multiple operating SYSTEMS
systems to run in parallel on the same computer. One of the most important considera-
In the IT world, this technology has been widely adopted and im- tions when evaluating virtualization
proved for years, resulting in an enormous reduction of the number of options for your application is I/O.
computers that it takes to run a typical set of server programs. It’s not dif- Does the piece of VMM software that
ficult to quantize the end benefits that virtualization has brought to IT or- you are evaluating support sharing I/O
ganizations; companies can save upwards of 80% on their energy costs devices if needed? What is the perform-
by implementing virtualization. The environmental benefits are also ance overhead introduced when the
clear: it’s estimated that unused server capacity worldwide produces VMM deals with incoming interrupts?
more pollution annually than the entire country of Thailand (4 tons of Can the VMM penalty be reduced to
CO2 per server). near zero when I/O devices are parti-
The following are just some of the potential use cases for virtualiza- tioned (assigned to a given operating
tion in engineering applications: (See sidebar on opposite page.) system)?
Let us first consider the case of
sharing a certain I/O device amongst
different virtual machines. There are
Partitioning I/O devices between operating systems improves two major challenges to this approach:
performance and allows the use of native device drivers. additional software complexity and
performance degradation. VMMs that
General-purpose OS Real-time OS share I/O devices between operating
Native driver Native driver systems must contain a driver for ac-
cessing those I/O devices (as shown in
Figure 2), which can mean additional
VMM software
development time. In addition, under a
sharing scheme VMMs must present
Bus interface IO Data-acquisition device virtual machines with an emulated view
of the hardware that is shared. As men-
Figure 3 tioned in the previous section, because

30 SEPTEMBER 2009 | embedded systems design | www.embedded.com


feature

! Though virtualization is a
promising technology that


Combining real-time processing and a graphical user interface on
one set of hardware.
Incorporating a wide variety of communication protocols and pro-

! can be beneficial in many


systems, it is important to •
grams that exist for a general purpose operating system (such as Lin-
ux) in conjunction with real-time performance and reliability.
Isolating critical system components from non-critical parts.

! understand the perform-


ance and I/O implications
• Accessing existing applications or driver stacks from a non-support-
ed operating system.

The benefits of virtualization include lower hardware costs, re-


duced overall system footprint, higher reliability due to increased isola-
shared I/O devices lead to more VMM tion of different subsystems (compared with a single-OS solution), and
intervention, some performance over- reduced development time in porting applications or drivers across op-
head is typically added with each I/O erating systems.
request or incoming interrupt.
A good example of shared I/O in a
virtualized system is an Ethernet de- cess the device directly using its native formance hit in exchange for the
vice. Sharing a single Ethernet connec- data-acquisition drivers. consolidation that virtualization
tion between different virtual machines The lesson here is simple: remem- brings (this can be minimized by
can be advantageous for reducing hard- ber to think carefully about I/O when partitioning system resources or
ware while providing connectivity to considering virtualization for your ap- choosing a VMM that incorporates
each operating system. To make this plication. VMMs that share devices can advanced technology);
possible, VMM software must include a
driver for communicating with the
provide convenience, but may require
additional development effort and
• you can afford to spend some de-
velopment time integrating a
physical Ethernet device and also pres- mean a higher performance hit. VMMs VMM into your application and
ent individual VMs with an emulated that partition devices typically provide developing drivers for shared I/O
view of the Ethernet device. When a higher performance and enable the use devices (less development time
VM wishes to communicate via Ether- of native device drivers by individual may be required with a turnkey
net, the VMM intervenes and uses its VMs, which can greatly reduce develop- virtualization solution or one that
driver to actually perform the commu- ment time. partitions devices).
nication, adding some latency in the
process. WHEN TO USE VIRTUALIZATION As engineering designs continue to
In contrast, let us consider partition- Though virtualization is a promising evolve, virtualization will play an im-
ing of I/O devices between virtual ma- technology that can be beneficial in portant role by decreasing system cost
chines. If access to a certain I/O resource many systems, it is important to under- and physical footprint while increasing
such as a data acquisition device is only stand the performance and I/O impli- capability. It is time for engineers to
needed from a particular VM, perform- cations when looking at incorporating look at virtualization technology seri-
ance can be increased and software com- virtualization into your application. ously, while at the same time carefully
plexity decreased. In this scheme, VMM Specifically, virtualization may be use- consider the engineering tradeoffs that
software doesn’t need to intervene at all ful if: virtualization brings and make sound
if configured correctly. VMs can use decisions. Ultimately, it’s up to you to
their native I/O drivers to access parti-
tioned devices without conflicts, as long
• you’re using multiple operating sys-
tems and reducing cost or physical
decide whether virtualization lives up
to the buzz. ■
as the VMM software hides these I/O de- footprint of your design is impor-
vices from all other VMs. See Figure 3 tant; Casey Weltzin is a product manager for
real-time solutions at National Instruments.
for a diagram showing how this works.
For example, in the case of a partitioned
• you aren’t currently using multiple
operating systems, but you wish to
Weltzin manages the LabVIEW Real-Time
product line, with special emphasis in
data-acquisition board the virtual ma- isolate pieces of your application multicore processing and virtualization
chine monitor doesn’t need a built-in from each other or reduce develop- technology solutions. Weltzin joined NI in
2005 and has worked as an applications
data-acquisition driver. A performance ment time by leveraging operating engineer and manager for the Engineer-
hit can be minimized or avoided by al- system capabilities; ing Leadership Program. He holds a de-
lowing a single operating system to ac- • you’re willing to take some per- gree in electrical engineering from the
University of Wisconsin-Madison.

www.embedded.com | embedded systems design | SEPTEMBER 2009 31


Build Your Own Embedded System

Behold the power of the Atom.


Get an Intel® Atom™ based
development kit running Windows®
Embedded Standard 2011 at ESC
Boston and take it from class to
class to build your own embedded
system. Reserve your seat and
kit, now. While you’re there,
learn about the 85+ technical
sessions, speakers and exhibitors.

Register Today.
embedded.com/boston
Use code ESCBPT

Proud sponsors of the Build Your Own Embedded System 2009 Hynes Convention Center
ESC paper

Is it the slow code movement? Here’s a seamless and continuous integration


approach being presented at ESC Boston this year (class 261) that allows you to
gradually introduce performance improvements while preserving an established func-
tional baseline in an embedded system with demanding characteristics requirements.

Seamless integration
of multicore
embedded systems

E
BY GIUSEPPE DE SIMONE, PAOLO PIERANI,
AND MASSIMO QUAGLIANI

mbedded software is by definition difficult to test: when run-


ning on the target, the code is not always reachable; debug-
ging tools their limits; sometimes the software designer may
have to code using assembly; hardware devices may have
bugs—just to mention a few typical barriers to a fault-free
embedded software application.
In recent years, multicore technolo- In the next phase, the “basic tested”
gy has contributed to increasing the software is loaded on the target hard-
complexity of embedded systems, ware (a microprocessor, DSP, multicore
showing a very interesting opportunity ASIC, FPGA) where the interaction
to find efficient solutions to problems with the real hardware is tested: a sec-
requiring high performance. ond type of fault will be discovered
Often the development of such sys- here. These faults often require more
tems requires the use of simulation time to be analyzed and fixed; typically
tools. A “basic test” phase with a simu- this is run in the lab with all the needed
lator is the first mandatory step in a instruments available for tracing and
comprehensive test strategy aiming at troubleshooting.
discovering and fixing all kind of As pieces are put together and the
faults. system grows, more complex function-

www.embedded.com | embedded systems design | SEPTEMBER 2009 33


ESC paper
alities will be tested, and a third type of an acceptable rate is a real “nightmare”: of unresolved troubles is very small and
faults comes up that originates from in- extra time, task forces, customer pres- the time needed for stabilizing the sys-
teraction of the many integrated soft- sure, and so forth. tem before delivering it to the customer
ware parts. Instead, a developer or a project is also very short.
When the system is almost complete manager would go for is normally solv- A good starting point is for sure
and under test for days, a fourth type of ing important issues day by day with- quality of the design base, but this is
fault is likely to show: a crash or a stop- out pressure. definitely not enough.
ping fault. Here, finding which part of The solution can be to adopt an in- Human nature prefers to minimize
the system is misbehaving is usually very tegration driven software engineering unnecessary work, and fault trou-
hard; in fact, this kind of fault is not easy model that blends functional decompo- bleshooting and fixing is indeed unnec-
to reproduce and when you succeed in sition (into very small packages) with essary work. So, any faults in the system
reproducing it, you realize you need rigorous management of ESW func- must be found early, corrected very
more logs and tracings. Time passes by tional and quantitative performance re- quickly as well as the project has to en-
and the software… is still unstable. quirements and continuous system in- sure that the same fault is not found
What is the best approach to deal tegration to secure a high degree of and corrected several times.
with such problems? Maybe nobody However the golden rule is to
has “the” answer, but we successfully
tried the approach we present in this
paper. The application case illustrat- ! Development of a complex
software system usually
make as few faults as possible from
the beginning by emphasizing soft-
ware quality practices (ideally you

!
ed at the end of the paper is a real should stop making faults at all!).
project where the multicore nature
deals with the same problem:
of the platform has been exploited the project finds too many System development in small steps

!
to address the fault localization with The solution is split into very small
a minimum human troubleshooting severe faults and undesired system changes (let’s call them deltas).
effort. A regression test is performed to
Editor’s note: The application
characteristics too late, lead- ensure that the changes implemented
case just mentioned is not included in
this printing but will be available in
the full article online at www.embed-
ded.com/219400429. This article is
! ing to off-control quality and
expensive re-engineering.
in all different system components are
working and not harming legacy
functions and thus avoiding the big
bang delivery at the end of the project.
an excerpt from the paper prepared The benefits are many:
for ESC Class 261 (Seamless Integration accuracy and efficiency.
of Multicore Embedded Systems), part of
the debugging tract at the Embedded Sys-
The suggested approach is then to
develop software with a tight control
• Get frequent feedback on product
quality.
tem Conference, Boston. Massimo
Quagliani will be teaching the class on
on system changes and pursuing an
early feedback on quality by using con-
• Reduce complexity: by implement-
ing a small chunk of code you re-
September 22. tinuous planning and small integration duce the number and the complexi-
steps, in order to achieve predictability ty of the faults potentially
SMALL STEPS APPROACH and efficiency. introduced in that step (less and
Generally speaking, development of a The main principles in this soft- simpler faults, much faster to solve
complex software system usually deals ware-engineering model are: and fix).
with the same problem: the project
finds too many severe faults and unde- (1) Quality first.
• Track progress: frequent deliveries
provide objective evidence of
sired characteristics too late, leading to (2) System development in small steps. progress in the project.
off-control quality and expensive re-
engineering.
(3) Continuous system integration.
(4) Regression test before verification.
• Achieve efficiency: by doing things
frequently, people can learn lessons
Most developers experience that af- (5) Parallel test phases. and improve project performances.
ter the code is developed and the sys-
tem is integrated, suddenly nothing Quality first The basic goal is then to setup a
seems to work anymore (even those To achieve the desired software quality, software factory that runs and verifies
parts that were working before). The the correction of faults should have frequent deliveries of new system ver-
time and effort needed to get the sys- higher priority than introducing addi- sions to get quality feedback early with
tem quality and performance back to tional system changes, so that backlog an efficient use of resources.

34 SEPTEMBER 2009 | embedded systems design | www.embedded.com


ESC paper
Project anatomy.
level (from basic test to system test), QUANTITATIVE MANAGEMENT OF
but the highest applicable test phase STEPWISE DEVELOPMENT
depends on the ESW/HW matu- Characterization of embedded soft-
rity stage. ware is one of the problems a designer
After delivery to integra- has to face at least once in a lifetime.
tion and regression test, the Typical critical factors are millions of
design team continues imple- instructions per second (MIPS) con-
menting new changes, but if sumption and memory usage (pro-
any fault is found on the deliv- gram memory and data memory). A
ered delta, it is corrected with higher “heavy” algorithm can load the proces-
priority. sor and affect a processing delay, with
Figure 1
the risk that real-time constraints are
Parallel test phases not fulfilled.
Continuous system integration The main enabler for a stable product To save money and reuse the avail-
Delivered deltas are integrated into the quality is to have feedback from test as able hardware for introducing new fea-
latest version of the system, either if early as possible. Doing verification ac- tures or simply to increase the number
they contain testable content or incom- tivities in parallel is a way to push for of channels per processor, the develop-
plete functions. At different ESW/HW high quality, as well as a way to decrease er has to face with code optimization
maturity stages, the number of compo- the verification lead time. and sometimes that means writing or
nents that can be integrated may vary, One could argue that starting sys- rewriting those parts of the software
but the ambition is to have system level tem test before function test is complet- that are the most time consuming.
integration as soon as possible. The performance gained optimiz-

!
Build procedure must be automat- ing the code is difficult to predict, espe-
ed and a smoke test is needed to ensure And even if this were not cially if the volume of software to be
basic functionalities of the system, in true, it’s good anyway to optimized is large and the amount of
order to reject failed components and assembly code to write is critical.
not to harm the integrated system.
A project anatomy (as sketched in
Figure 1) is used to show all planned ! let testers “play” with
the system, even only to
In this scenario, it’s important to
monitor the optimization activity and
take the right corrective action at the
development steps and how they de-
pend on each other, from test, integra-
tion and design perspective: it is a very
important tool to find the most effi-
cient way to implement a given set of
! get rid of as many bugs
as possible.
right time in order to avoid any im-
pacts on the project deadline.
Profiling your code is another fea-
ture available to almost every real-time
debugger.
system changes. ed can be risky: the system stability The small-steps approach described
The dependencies define the order might not be mature enough, for in- here is well suited for this kind of activ-
in which the different deltas must be stance, to stand up to stress tests. That’s ity, where the metric can be measured
integrated and hence the implementa- not really the case: the assumption is at each step.
tion order: changes without dependen- that the two activities should find dif-
cies, for instance, can be developed in ferent faults. The problem of tracking progress
parallel with each other. And even if this were not true, it’s Estimating the final optimization grade
good anyway to let testers “play” with reached at the end of the project can be
Regression test before verification the system, even only to get rid of as done if you have access to historical
Once the code implementing a delta is many bugs as possible: it will be much data for assembly conversion and by
delivered, a regression test is run first to cheaper and faster to verify a less faulty studying algorithm-level optimization
secure that previous baseline has not system. techniques. It’s necessary to have access
been destroyed (whatever worked be- However, the whole model is really to a previous project’s information with
fore must keep on working), before ver- effective (and doesn’t risk becoming a a similar software application. For in-
ification of new functions starts. “nightmare” itself) only if it’s run on stance, optimizing estimations for the
A fully automated regression test each of the very small system changes, AMR-WB speech codec algorithm
suite is dynamically updated on system and quality gates are established before could be based upon the result ob-
growth, driven by project anatomy. The entering the different test phases in tained from the same activity on the
same approach is performed at every parallel. predecessor AMR-NB algorithm.

www.embedded.com | embedded systems design | SEPTEMBER 2009 35


ESC paper
The final estimation has an un-
avoidable uncertainty depending of the
pressed in terms of optimization grade
under the constraints above and assum-
• Put more effort in optimizing the
remaining functions;
complexity of the application. For this
reason the estimate is given with a mini-
ing that each function cannot be opti-
mized beyond a reasonable limit.
• Analyze the functions performing
worse than predicted and rework
mum and a maximum value. them to be more aligned with the
But even if the estimation of the to- min f(r) = Σ ri prediction.
tal load reduction can be considered re- Σ ri yi ≥ RY
liable, the problem of verifying the per- 0 ≤ ri ≤ limit ∀ i∈ set of functions Practical consideration
formance of the optimization activity to be optimized Solving the linear programming prob-
during the project is still an issue. After lem requires to use the simplex algo-
the first optimizations, the measure- Having the predicted curve of load rithm. You do not need any special tool
ments can give us some figures. How reduction, it’s now possible to monitor for it; the lp-solver is provided by the
can we use this information to judge if if the activity is on track (in terms of common Excel spreadsheet with the
we are on the right track? There is no non functional requirement fulfill- solver add-in.
way to say that the activity is on track ment), but this requires a certain Sometimes designers prefers to fol-
(concerning the nonfunctional require- low their experience or their rule of
ment of target MIPS load) because thumb, which no doubt could be suc-
there is no estimation of the ideal path
between the start and the final target
values. ! A more structured
approach exploiting ready
cessful, but a more structured approach
exploiting ready to use techniques, such
as linear programming, leads to a more

Solution
A solution of the tracking problem ex- ! to use techniques, such
as linear programming,
controlled development.
In this section, we have seen the
case of MIPS load optimization, but

!
ists. Of course, a linear law of the load similar considerations are valid also for
reduction is a wrong model and cannot
leads to a more con- memory optimization and, being a
be taken into account because the opti- trolled development. general approach, it could be applied to
mization degree depends on the kind of other kind of problems.
program flow in each function.
The correct model can be obtained amount of test at least in simulation CONTINUOUS INTEGRATION
solving a linear programming problem. environment in order to profile the op- As explained early, one of the key suc-
Let’s assume: timized functions as soon as they are cess factors of the method proposed in
ready to be integrated. this paper is a continuous integration
R = mean reduction factor (this is the of the step-wise delivered software to
estimated load reduction per chan- Iterative approach ensure an always-working system,
nel) The predicted curve of load reduction whereas verification activities at differ-
Y = clock cycles of the not optimized is measured at each step of the itera- ent levels are run in parallel, to get an
channel code after N frames tion. Deviations are allowed and the earliest and efficient quality feedback
yi = clock cycles of the i-th function ac- right actions are taken accordingly. As from different perspectives: coding, in-
cumulated after N frames (descents soon as the profiler returns the load re- terfaces, functionalities, robustness, and
excluded) duction of a released function, the esti- performances. ■
ri = reduction factor of the i-th function mated curve for the remaining steps is
(descents excluded) modified taking into account the meas- Giuseppe De Simone is a systems man-
ager, Paolo Pierani is a program man-
ures. The model is then updated with ager, and Massimo Quagliani is a sen-
So the following constraint is valid: the following constraints: ior software engineer, all with Ericsson
(Italy). The three authors joined forces in
the areas of embedded software design,
Σ (1 – ri) yi ≤ (1 – R)Y ri = ri* ∀ i∈ set of already systems, project engineering, and de-
optimized functions fined and implemented the strategy. You
Simplifying: 0 ≤ ri ≤ limit ∀ i∈ set of functions may reach the authors through massi-
mo.quagliani@ericsson.com.
to be optimized
Σ ri yi ≥ RY Note: This article is an excerpt of an ESC
With this approach, a feedback can class paper for ESC Boston, 2009. The full
The linear programming problem be received and evaluated upfront. Pos- article is available at www.embedded.
is obtained by minimizing the effort ex- sible corrective actions are: com/219400429

36 SEPTEMBER 2009 | embedded systems design | www.embedded.com


By Jack G. Ganssle
break points
Thanks for the memories
I f each of our hands had eight the oldest known written data.

Photo: Tartaria Tablet (Amulet): Museum of: National Transylvanian History Museum di Cluj-Napoca, Period is
fingers, we’d count in hexadeci- The regular shapes of the
mal and non-geeks wouldn’t glyphs encoded in the clay sug-
puzzle over numbers that include gest that standardized writing
the letters A through F. But would had existed for some time. No
we still start at one instead of zero? one knows what the symbols
How often have you seen a mean, but a media that lasts

Developed Neolithic; photo from www.europeanvirtualmuseum.net/virtual_museum/.


child learning to enumerate hold seven thousand years puts all of
three fingers out in an effort to re- our modern high-tech solu-
member how many of something tions to shame.
he just counted? Perhaps these Just as the Kindle uses se-
digits were the earliest memory quences of ones and zeroes to
devices. store the Kama Sutra, at some
Or maybe not. Long before point most societies moved
writing was invented people had from pictographs to alphabets.
developed storytelling to a fine art. Egyptian hieroglyphs contain
Oral traditions taught the young to elements of both. Our alphabet

!
avoid the bad berries and saber- reduces the number of symbols need-
toothed tigers. Long before being From oral tradition and ed to express complex ideas from
committed to paper or papyrus, the clay tablets all the way thousands to 26. Fewer symbols
Bible was transmitted between genera- means more storage is needed to en-
tions by word of mouth. Matthew and
Luke’s begats most likely mirrored
how elders tracked their own family ! to modern memory
devices, Jack Ganssle
code an idea, but there’s no practical
limit to the things that can be de-
scribed. This remains a very diverse

! traces the history of


history. world, so ironically we still need thou-
The very concept of “mine” that sands of representations in Unicode to
toddlers unceasingly chant likely is memory. build computers useful to the planet’s
buried deep in our genes. Unless earli- population.
est societies were truly communal, isted as far back as some 30,000 years Clay tablets gave way to papyrus
Grog the caveman would have needed ago. These were mnemonics rather and parchment. The latter, made from
some sort of device, perhaps piles of than novels; but the very word animal skin, is quite expensive and led
stones, to track exactly how much was mnemonic means “memory aid.” to what was perhaps the first
“mine.” Clearly, humankind has long wanted rewritable storage medium: the
At some point humans made the mass storage. Unfortunately, the early palimpsest. Scribes would scrape or
leap from physical representations of history of writing has gone to wash the ink from a parchment docu-
quantities to the abstract. Perhaps the /dev/null. ment and write again on the now-
first form of writing involved scratch- Some scholars date the Tărtăria blank sheet. In fact here in Baltimore,
ing lines in the dirt as a memory aid. I tablets from Romania back to 5,500 the Walters Art Museum holds the
have read that the earliest symbols ex- BC, in which case they may preserve Archimedes Palimpsest
(www.archimedespalimpsest.org/). In
the tenth century an unknown scribe
copied some of Archimedes’ work
Jack G. Ganssle is a lecturer and consultant on embedded
development issues. He conducts seminars on embedded systems onto parchment; two centuries later it
and helps companies with their embedded challenges. was reused for a liturgical text. Science
Contact him at jack@ganssle.com. has been able to reveal the original

www.embedded.com | embedded systems design | SEPTEMBER 2009 37


EMBEDDED SYSTEMS MARKETPLACE
Compact Embedded Server
T he Compact
SIB is a small,
low cost, yet
powerful Server
that is ideal for hostile
rugged environments &
has the following features:
!Fanlessx861GHzCPU
!256MBDDR2RAMOnBoard 2.6KERNEL
!128MBInternalFlashDisk
!10/100Base-T Ethernet
!OnBoard Audio
!2RS-232&3USB2.0Ports Pricing
!CompactFlash&MicroSDSlots Starts
!Reliable(NoCPUFanorDiskDrive) at
!OptionalWirelessLAN&HardDrive $230.00
!Dimensions:4.5”x4.5”x1.375”
http://www.emacinc.com/linux_compact_sib.htm
Since 1985
OVER
24
YEARS OF
EQUIPMENT MONITOR AND CONTROL SINGLE BOARD
SOLUTIONS

Phone: (618) 529-4525 · Fax: 457-0110 · Web: www.emacinc.com

Programming pointers . . . from page 13


ad index
The delete_widget_array function assumes that its parameter, pw,
is a pointer returned from some prior call to new_widget_array. There-
Advertiser URL Page
fore, delete_widget_array must obtain the array dimension by effec- ARM/KEIL www.onARM.com 10
tively reversing the pointer computation done in new_widget_array. A EMAC, INC. www.emacinc.com 38
EXPRESS LOGIC www.rtos.com 4
complete implementation of the function looks like: GREEN HILLS SOFTWARE INC www.ghs.com 1
INTEL CORP intelembeddedevent.com 17
MEN MICRO www.ESM-express.com 20
void delete_widget_array(widget *pw) MENTOR GRAPHICS www.mentor.com/nucleus CV3
{ MICROCHIP www.microchip.com/PICkit3 2
MICRO DIGITAL www.smxrtos.com/usb 26
if (pw != NULL) MOUSER ELECTRONICS www.mouser.com 8
{ NATIONAL INSTRUMENTS www.ni.com/multicore/ 7
SARL CALAO SYSTEMS www.calao-systems.com 21
size_t *ps = (size_t *)pw; SEGGER MICROCONTROLLER www.segger.com 16
size_t n = *--ps; SMART BEAR SOFTWARE www.CodeCollaborator.com 38
TECH TOOLS www.tech-tools.com 38
widget *p = pw + n; TECHNOLOGIC SYSTEMS www.embeddedARM.com 24
while (p != pw) THE MATHWORKS www.mathworks.com/connectCV2
widget_destroy(--p);
free(ps);
} ADVERTISING
} EMBEDDED SYSTEMS DESIGN

MEDIA KIT: www.embedded.com/mediakit


PADDING AND ALIGNMENT, AGAIN
These implementations of new_widget_array and delete_widget_ Sales Advertising
array will work just fine on any processor in which no type has an align-
TechInsights Coordination and
600 Harrison St., 5th Flr. Production
ment stricter than that of size_t. However, it could produce undefined San Francisco, CA 94107 United Business Media
behavior on any machine with more strictly aligned types. I’ll explain why, David Blaza 600 Community Drive
and what you can do about it, in a future column. ■ Publisher Manhasset, NY 11030
(415) 947-6929 Donna Ambrosino
dblaza@techinsights.com Production Director
ENDNOTES: TechInsights (516) 562-5115
1. Saks, Dan.“Allocating objects vs. allocating storage,” www.embedded.com/210200586 600 Community Drive dambrosi@ubm-us.com
2. Saks, Dan.“Allocating arrays,” www.embedded.com/212700451 Manhasset, NY 11030
3. Saks, Dan.“Deallocating objects vs. deallocating storage,” www.embedded.com/214501964 Bob Dumas
Associate Publisher
4. Saks, Dan.“Abstract Types Using C,” www.embedded.com/15300198 (516) 562-5742
5. Saks, Dan.“Incomplete Types as Abstractions,” www.embedded.com/16100434 bdumas@techinsights.com
6. Saks, Dan.“Tag vs.Type Names,” www.embedded.com/9900748.

38 SEPTEMBER 2009 | embedded systems design | www.embedded.com


break points
ENIAC: Detail of the
back of a panel of expansion in the use of punched card
ENIAC, showing vacu- machines, and so in some way con-
um tubes. Copyright
2005 Paul W Shaffer, tributed to the birth of the computer in-
curator of the
University of dustry. (Aside: Ida May Fuller got the
Pennsylvania ENIAC
Museum.
first SS check. She paid $24.75 into the
(http://en.wikipedia.or system, or one percent of three year’s in-
g/wiki/File:ENIAC_Pen
n2.jpg) come, and netted almost $23,000 in
payouts before dying at age 100, surely
one of the great jackpots of all time!)
Computer centers used fantastic
quantities of punched cards. In the ear-
ly 1970s, the University of Maryland
supplied unlimited quantities to stu-
dents for free. A 10,000-line program
needed 10,000 cards, which filled five
boxes. Even as early as 1937, IBM man-
ufactured five to 10 million cards a day.
Yet when working on the ESC’s twenti-
eth anniversary event, I could find
none, except from a vendor who

!
text, which is fortunate as it has charged a buck a card, since they faded
the only known copy of the sage Computer centers used fan- away in the ’70s as other media became
of Syracuse’s The Method of Me- tastic quantities of punched more cost effective.
chanical Theorems. Paper tape, too, was initially used
I suppose one could make the
argument that the abacus was a
storage device, since, like the reg- ! cards. In the early 1970s,
the University of Maryland
for looms. Both Morse and Edison
worked on paper tape systems for teleg-
raphy, though their systems initially

!
isters in a CPU, it held numbers supplied unlimited quantities used marks made on the tape instead of
during a calculation. The Sumeri- holes. In the 1920s, various communi-
ans had this technology, in a to students for free. cations links using teletypewriters were
primitive form, nearly 5,000 years established. The infamous newsrooms
ago. The Romans called the limestone lating machine that used punched of yore with dozens of these clanking
pebbles used in their table abacuses cards, but he wasn’t the one to couple machines are an example. Prior to and
“calculi,” from which we derive the cards to computing. Russian Semen during World War II, the Teletype cor-
word “calculate.” Korsakov, a bureaucrat in a police sta- poration built some 200,000 of their
After the invention of paper in Chi- tistics department, anticipated Google model 15 teletypewriters (often called
na around the second century, not when he invented several machines that “teletypes,” just as copiers are referred
much happened to storage technology used punched cards to search though to as “xerox machines.”) These usually
for thousands of years. Paper remained data. But they remained a novelty in in- had a paper tape reader and punch at-
expensive and was hand-made until the formation processing until Herman tached with which to log and send
19th century, and indeed history Hollerith built machines to record data streams of messages. All of the model
records “rag pickers” who recycled old for the 1890 census. His company later 15 machines I have seen use a five-level
cloth for papermaking. morphed into IBM. That company sup- code derived from Baudot rather that
Most techies know that the ported a number of different kinds of the ASCII 8 or Unicode’s zillions com-
punched card long predated main- punched cards, but their 80-column mon today. Mechanical fingers probed
frames. Cards and paper tape were version remains the iconic image of the the holes across the tape and converted
originally adopted in France in the ear- technology. the pattern to a 5-bit parallel stream,
ly 18th century to control textile looms. My grandmother died years ago at which was sent to what looked exactly
(I would have thought that the idea of age 99, still furious at FDR (she couldn’t like a car’s distributor (for a five cylin-
punching holes in paper to store in- even say his name, always blurting out der car, that is). Five contacts were
structions and data stemmed from “that man!”) for creating Social Security swept by a rotor to convert the parallel
player pianos, but those didn’t come and other Depression-era programs. to serial.
around till over a century later.) She drew on Social Security for 34 years. Konrad Zuse in Germany used pa-
Charles Babbage designed a calcu- But FDR’s creation did drive a massive per tape to feed instructions to his Z1

www.embedded.com | embedded systems design | SEPTEMBER 2009 39


break points
The story of once-great Wang Laboratories, or from
computers is one of Jay Forrester and Ken Olsen in their
speed, and 10 CPS work on the Whirlwind computer, de-
didn’t tax even the pending on which sources one be-
low-powered ma- lieves. Wang’s paper came out in 1949,
chines of the ’60s. but core didn’t become viable in com-
High-speed paper- puters till the following decade. Core
tape readers ap- memory is composed of large planes
peared, which could of tiny ferrite donuts. Three or four
suck tape through wires are threaded through each torus,
at hundreds of and they are all interconnected in an
characters per sec- X-Y matrix. By sending small currents
Tape: A tray of tapes for Data General’s Nova minicomputer. ond. When some- through a row and a column it’s possi-
(http://en.wikipedia.org/wiki/File:Dg-papertapes.jpg)
thing went wrong, ble to flip the magnetic field of the
machine, completed in 1938. He also these devices would spew a snarl of core located at the X-Y intersection. A
used moving metal sheets as main tape that could fill an office in seconds. sense wire detects the transition to sig-
memory, for a total of sixty-four 22-bit Vendors of both mini- and micro- nal if the core was set to a zero or one.
words (www.epemag.com/zuse/ computers delivered their tools on tape. This is a destructive read, so another
part3b.htm). Happily, programs then were not 500- cycle resets the core. Compared with
Teletype’s ASR-33 appeared in MB monsters! other early memory devices, core was
1961. At $2,000 (about $14,000 very fast, switching in under a

! Eventually sizes shrank tremendously.


today), it was inexpensive, as microsecond.
these machines went, and was a Core became the standard
perfect match for the newly Prices did too, falling to about a memory store for all computers

!
emerging minicomputers. from the ’50s until superseded by
The machine had a built-in penny a bit, about seven orders of semiconductor memory in the
8-bit paper tape reader and magnitude more than what memory ’70s. Though early cores weren’t
punch that read tape at a blis- much smaller than a Cheerio,
tering 10 characters per sec-
ond. It’s interesting that the in-
credibly complex mechanical
mechanism of the ASR-33 was
! costs today. What other industry has
seen costs tumble so precipitously?
eventually sizes shrank tremen-
dously. Prices did too, falling to
about a penny a bit, about seven
orders of magnitude more than
much cheaper than the electronics Tape remained the mass storage of what memory costs today. Think about
needed to build a video terminal, so choice in the ’70s till supplanted by that—what other industry has seen
these machines were common in com- magnetic media. costs tumble so precipitously?
puter centers. The racket 50 of them Meanwhile, other forms of memory The Whirlwind machine men-
made in a terminal room cannot be were tried, used and abandoned. A tioned in the previous paragraph even-
imagined. ASR-33s were the first termi- complete list would take volumes, but tually had core memory, but initially re-
nals used by microprocessor develop- here are some of my favorites. lied on Williams-Kilburn tubes. These
ment systems. Today, we store programs and data were essentially CRTs that painted bits
in, among other things, active elements on the phosphor screen (some versions
Flash memory stick.
like transistors. That’s not a new con- didn’t bother with the phosphor coat-
cept. The vacuum tube is an active ele- ing). A metal plate on the front of the
ment as well, and was used in early screen sensed the charges. Typical
computers both as ALU and memory. Williams-Kilburn tubes could hold a
Of course, a tube is about the size of a few hundred to a thousand bits; those
skyscraper compared to 45-nm FETs, used in the Whirlwind stored 256 bits
and dissipates enormous amounts of each. Unfortunately, the tubes aged
heat. ENIAC stored twenty 10-digit poorly, couldn’t be scaled up to higher
numbers in ring counters. Every digit memory densities, and were very sub-
stored needed 36 tubes. ject to electronic noise.
Core memory came about either But they were a pretty cool idea.
from a paper by An Wang, whom old- And I’ve run out of room. More
timers recognize as the man behind next month! ■

40 SEPTEMBER 2009 | embedded systems design | www.embedded.com

You might also like