You are on page 1of 100

COMMUNICATIONS

ACM
CACM.ACM.ORG OF THE 06/2019 VOL.62 NO.06

Geoffrey Hinton, Yoshua Bengio,


and Yann LeCun Association for
Computing Machinery

Recipients of ACM’s A.M. Turing Award


DREAM
ZONE!
The 12th ACM SIGGRAPH Conference Conference 17 - 20 November 2019
and Exhibition on Computer Graphics Exhibition 18 - 20 November 2019
and Interactive Techniques in Asia
Brisbane Convention & Exhibition Centre (BCEC),
Brisbane, Australia

Sponsored by: Organized by


Publish Your Work Open Access With ACM!

ACM offers a variety of Open Access publishing options


to ensure that your work is disseminated to the widest possible
readership of computer scientists around the world.

Please visit ACM’s website to learn more about


ACM’s innovative approach to Open Access at:
https://www.acm.org/openaccess
COMMUNICATIONS OF THE ACM

Departments News Viewpoints

5 From the President


ACM Awards Honor
CS Contributions
By Cherri M. Pancake

7 Cerf’s Up
Back to the Future
By Vinton G. Cerf

8 BLOG@CACM
Is CS Really for All, and Defending
Democracy in Cyberspace
Mark Guzdial mulls the difficulty
of getting into a computer science
class, while John Arquilla ponders 16 25
political warfare in cyberspace.
10 Neural Net Worth 22 Global Computing
27 Calendar Yoshua Bengio, Geoffrey Hinton, Global Data Justice
and Yann LeCun this month A new research challenge
92 Careers will receive the 2018 ACM for computer science.
A.M. Turing Award for conceptual By Linnet Taylor
and engineering breakthroughs
Last Byte that have made deep neural 25 Inside Risks
networks a critical component Through Computer
96 Q&A of computing. Architecture, Darkly
Reaching New Heights By Neil Savage Total-system hardware and
with Artificial Neural Networks microarchitectural issues
ACM A.M. Turing Award recipients 13 Lifelong Learning are becoming increasingly critical.
Yoshua Bengio, Geoffrey Hinton, in Artificial Neural Networks By A.T. Markettos, R.N.M. Watson,
and Yann LeCun on the promise New methods enable systems to S.W. Moore, P. Sewell, and P.G. Neumann
of neural networks, the need for rapidly, continuously adapt.
new paradigms, and the concept of By Gary Anthes 28 The Profession of IT
making technology accessible to all. An Interview with
By Leah Hoffmann 16 And Then, There Were Three David Brin on Resiliency
How long can the silicon foundry Many risks of catastrophic failures IMAGES BY: ( L) M ACRO PH OTO; ( R) A NDRIJ BORYS ASSOCIAT ES, USING SH UT TERSTOCK
sector continue to adapt, of critical infrastructures can be
Watch the recipients discuss
this work in the exclusive as physical limits make further significantly reduced by
Communications video. shrinkage virtually impossible? relatively simple measures
https://cacm.acm.org/
videos/2018-acm-turing- By Don Monroe to increase resiliency.
award By Peter J. Denning
19 Ethics in Technology Jobs
Employees are increasingly 32 Viewpoint
challenging technology companies Personal Data and
on their ethical choices. the Internet of Things
By Keith Kirkpatrick It is time to care about
digital provenance.
By Thomas Pasquier, David Eyers,
and Jean Bacon

2 COMMUNICATIO NS O F THE ACM | J U NE 201 9 | VO L . 62 | NO. 6


06/2019 VOL. 62 NO. 06

Practice Contributed Articles Review Articles

42 54 70

36 Garbage Collection as a Joint Venture 54 Programmable Solid-State Storage 70 The Challenge of Crafting
A collaborative approach in Future Cloud Datacenters Intelligible Intelligence
to reclaiming memory in Programmable software-defined To trust the behavior of complex
heterogeneous software systems. solid-state drives can move AI algorithms, especially in
By Ulan Degenbaev, Michael Lippautz, computing functions closer mission-critical settings,
and Hannes Payer to storage. they must be made intelligible.
By Jaeyoung Do, Sudipta Sengupta, By Daniel S. Weld and Gagan Bansal
42 How to Create a Great Team Culture and Steven Swanson
(and Why It Matters) Watch the authors discuss
this work in the exclusive
Build safety, share vulnerability, 63 Engineering Trustworthy Systems: Communications video.
and establish purpose. A Principled Approach https://cacm.acm.org/
videos/the-challenge-
By Kate Matsudaira to Cybersecurity of-crafting-intelligible-
Cybersecurity design reduces intelligence

45 Research for Practice: the risk of system failure from


Troubling Trends in cyberattack, aiming to maximize Research Highlights
Machine-Learning Scholarship mission effectiveness.
Some ML papers suffer from By O. Sami Saydjari 82 Technical Perspective
flaws that could mislead the public Back to the Edge
and stymie future research. By Rishiyur S. Nikhil
By Zachary C. Lipton
and Jacob Steinhardt 83 Heterogeneous Von Neumann/
Dataflow Microprocessors
Articles’ development led by
IMAGES BY: ( L) A NTON JA NKOVOY; ( C, R) FRO M SH UTT ERSTO CK .COM

By Tony Nowatzki, Vinay Gangadhar,


queue.acm.org
and Karthikeyan Sankaralingam

About the Cover:


The recipients of the
2018 ACM A.M. Turing
Award (from left) Yann
LeCun, Geoffrey Hinton,
and Yoshua Bengio
photographed at the
Vector Institute in Toronto,
Canada, on April 11, 2019.
Photographer: Association for Computing Machinery
Alexander Berg, Advancing Computing as a Science & Profession
https://www.alexberg.com/

JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF THE ACM 3


COMMUNICATIONS OF THE ACM
Trusted insights for computing’s leading professionals.

Communications of the ACM is the leading monthly print and online magazine for the computing and information technology fields.
Communications is recognized as the most trusted and knowledgeable source of industry information for today’s computing professional.
Communications brings its readership in-depth coverage of emerging areas of computer science, new trends in information technology,
and practical applications. Industry leaders use Communications as a platform to present and debate various technology implications,
public policies, engineering challenges, and market trends. The prestige and unmatched reputation that Communications of the ACM
enjoys today is built upon a 50-year commitment to high-quality editorial content and a steadfast dedication to advancing the arts,
sciences, and applications of information technology.

ACM, the world’s largest educational STA F F EDITORIAL BOARD ACM Copyright Notice
and scientific computing society, delivers DIRECTOR OF PU BL ICATIONS E DITOR- IN- C HIE F Copyright © 2019 by Association for
resources that advance computing as a Scott E. Delman Andrew A. Chien Computing Machinery, Inc. (ACM).
science and profession. ACM provides the cacm-publisher@cacm.acm.org eic@cacm.acm.org Permission to make digital or hard copies
computing field’s premier Digital Library Deputy to the Editor-in-Chief of part or all of this work for personal
and serves its members and the computing Executive Editor Lihan Chen or classroom use is granted without
profession with leading-edge publications, Diane Crawford cacm.deputy.to.eic@gmail.com fee provided that copies are not made
conferences, and career resources. Managing Editor S E NIOR E DITOR or distributed for profit or commercial
Thomas E. Lambert Moshe Y. Vardi advantage and that copies bear this
Executive Director and CEO Senior Editor notice and full citation on the first
Vicki L. Hanson Andrew Rosenbloom NE W S page. Copyright for components of this
Deputy Executive Director and COO Senior Editor/News Co-Chairs work owned by others than ACM must
Patricia Ryan Lawrence M. Fisher Marc Snir and Alain Chesnais be honored. Abstracting with credit is
Director, Office of Information Systems Web Editor Board Members permitted. To copy otherwise, to republish,
Wayne Graves David Roman Monica Divitini; Mei Kobayashi; to post on servers, or to redistribute to
Director, Office of Financial Services Editorial Assistant Rajeev Rastogi; François Sillion lists, requires prior specific permission
Darren Ramdin Danbi Yu and/or fee. Request permission to publish
Director, Office of SIG Services VIE W P OINTS from permissions@hq.acm.org or fax
Donna Cappo Art Director Co-Chairs (212) 869-0481.
Director, Office of Publications Andrij Borys Tim Finin; Susanne E. Hambrusch;
Scott E. Delman Associate Art Director John Leslie King; Paul Rosenbloom For other copying of articles that carry a
Margaret Gray Board Members code at the bottom of the first or last page
Assistant Art Director Michael L. Best; Judith Bishop; or screen display, copying is permitted
ACM CO U N C I L
Mia Angelica Balaquiot James Grimmelmann; Mark Guzdial; provided that the per-copy fee indicated
President
Production Manager Haym B. Hirsch; Richard Ladner; in the code is paid through the Copyright
Cherri M. Pancake
Bernadette Shade Carl Landwehr; Beng Chin Ooi; Clearance Center; www.copyright.com.
Vice-President
Intellectual Property Rights Coordinator Francesca Rossi; Len Shustek; Loren Terveen;
Elizabeth Churchill
Barbara Ryan Marshall Van Alstyne; Jeannette Wing; Subscriptions
Secretary/Treasurer
Advertising Sales Account Manager Susan J. Winter An annual subscription cost is included
Yannis Ioannidis
Ilia Rodriguez in ACM member dues of $99 ($40 of
Past President
Alexander L. Wolf P R AC TIC E which is allocated to a subscription to
Chair, SGB Board Columnists Co-Chairs Communications); for students, cost
Jeff Jortner David Anderson; Michael Cusumano; Stephen Bourne and Theo Schlossnagle is included in $42 dues ($20 of which
Co-Chairs, Publications Board Peter J. Denning; Mark Guzdial; Board Members is allocated to a Communications
Jack Davidson and Joseph Konstan Thomas Haigh; Leah Hoffmann; Mari Sako; Eric Allman; Samy Bahra; Peter Bailis; subscription). A nonmember annual
Members-at-Large Pamela Samuelson; Marshall Van Alstyne Betsy Beyer; Terry Coatta; Stuart Feldman; subscription is $269.
Gabriele Anderst-Kotis; Susan Dumais; Nicole Forsgren; Camille Fournier;
Renée McCauley; Claudia Bauzer Mederios; C O N TAC T P O IN TS Jessie Frazelle; Benjamin Fried; Tom Killalea; ACM Media Advertising Policy
Elizabeth D. Mynatt; Pamela Samuelson; Copyright permission Tom Limoncelli; Kate Matsudaira; Communications of the ACM and other
Theo Schlossnagle; Eugene H. Spafford permissions@hq.acm.org Marshall Kirk McKusick; Erik Meijer; ACM Media publications accept advertising
SGB Council Representatives Calendar items George Neville-Neil; Jim Waldo; in both print and electronic formats. All
Sarita Adve and Jeanna Neefe Matthews calendar@cacm.acm.org Meredith Whittaker advertising in ACM Media publications is
Change of address at the discretion of ACM and is intended
BOARD C HA I R S acmhelp@acm.org C ONTR IB U TE D A RTIC LES to provide financial support for the various
Letters to the Editor Co-Chairs activities and services for ACM members.
Education Board
letters@cacm.acm.org James Larus and Gail Murphy Current advertising rates can be found
Mehran Sahami and Jane Chu Prey
Board Members by visiting http://www.acm-media.org or
Practitioners Board
W E B S IT E William Aiello; Robert Austin; Kim Bruce; by contacting ACM Media Sales at
Terry Coatta
http://cacm.acm.org Alan Bundy; Peter Buneman; Jeff Chase; (212) 626-0686.
Andrew W. Cross; Yannis Ioannidis;
REGIONA L C O U N C I L C HA I R S WEB BOARD Single Copies
Gal A. Kaminka; Ben C. Lee; Igor Markov;
ACM Europe Council Chair Single copies of Communications of the
Lionel M. Ni; Adrian Perrig; Doina Precup;
Chris Hankin James Landay ACM are available for purchase. Please
Marie-Christine Rousset; Krishan Sabnani;
ACM India Council Board Members contact acmhelp@acm.org.
m.c. schraefel; Ron Shamir; Alex Smola;
Abhiram Ranade Marti Hearst; Jason I. Hong; Sebastian Uchitel; Hannes Werthner;
ACM China Council Jeff Johnson; Wendy E. MacKay COMMUN ICATION S OF THE ACM
Reinhard Wilhelm
Wenguang Chen (ISSN 0001-0782) is published monthly
AU T H O R G U ID E L IN ES by ACM Media, 2 Penn Plaza, Suite 701,
RES E A R C H HIGHLIGHTS
PUB LICATI O N S BOA R D http://cacm.acm.org/about- New York, NY 10121-0701. Periodicals
Co-Chairs
Co-Chairs communications/author-center postage paid at New York, NY 10001,
Azer Bestavros and Shriram Krishnamurthi
Jack Davidson and Joseph Konstan Board Members and other mailing offices.
Board Members ACM ADVERTISIN G DEPARTM E NT Martin Abadi; Amr El Abbadi;
Phoebe Ayers; Edward A. Fox; Chris Hankin; 2 Penn Plaza, Suite 701, New York, NY Animashree Anandkumar; Sanjeev Arora; POSTMASTER
Xiang-Yang Li; Nenad Medvidovic; 10121-0701 Michael Backes; Maria-Florina Balcan; Please send address changes to
Tulika Mitra; Sue Moon; Michael L. Nelson; T (212) 626-0686 David Brooks; Stuart K. Card; Jon Crowcroft; Communications of the ACM
Sharon Oviatt; Eugene H. Spafford; F (212) 869-0481 Alexei Efros; Bryan Ford; Alon Halevy; 2 Penn Plaza, Suite 701
Stephen N. Spencer; Divesh Srivastava; Gernot Heiser; Takeo Igarashi; Sven Koenig; New York, NY 10121-0701 USA
Robert Walker; Julie R. Williamson Advertising Sales Account Manager Greg Morrisett; Tim Roughgarden;
Ilia Rodriguez Guy Steele, Jr.; Robert Williamson;
ACM U.S. Technology Policy Office ilia.rodriguez@hq.acm.org Printed in the USA.
Margaret H. Wright; Nicholai Zeldovich;
Adam Eisgrau,
Andreas Zeller
Director of Global Policy and Public Affairs Media Kit acmmediasales@acm.org
1701 Pennsylvania Ave NW, Suite 200,
Washington, DC 20006 USA S P EC IA L S EC TIONS
T (202) 580-6555; acmpo@acm.org Association for Computing Machinery Co-Chairs
(ACM) Sriram Rajamani, Jakob Rehof,
Computer Science Teachers Association 2 Penn Plaza, Suite 701 and Haibo Chen A
SE
REC
Y

Jake Baskin New York, NY 10121-0701 USA Board Members


E

CL
PL

Executive Director T (212) 869-7440; F (212) 869-0481 Tao Xie; Kenjiro Taura; David Padua
NE
TH

S
I

Z
I

M AGA

4 COMM UNICATIO NS O F THE ACM | J U NE 201 9 | VO L . 62 | NO. 6


from the president

DOI:10.1145/3326069 Cherri M. Pancake

ACM Awards Honor CS Contributions

I
N T H I S I S S U E of Communications, have had a demonstrable effect on com- worldwide for courses in introductory
as evidenced by the cover and puting practice. Pevzner pioneered algo- computer science. Chris Stephenson is
lead article, we celebrate the lat- rithms for rapidly sequencing DNA; his receiving the Outstanding Contribution
est recipients of the ACM A.M. algorithms underlie almost all sequence to ACM Award for her landmark work in
Turing Award. Yoshua Bengio, assemblers used today and were used to bringing K–12 teachers worldwide the
Yann LeCun, and Geoffrey Hinton car- reconstruct the vast majority of genomic tools and resources needed to introduce
ried out pioneering work in deep learn- sequences available in databases. The computer science to future generations.
ing that has touched all our lives. As Tur- ACM Grace Murray Hopper Award hon- The recipient of the ACM Eugene L.
ing Laureates, they now join the eminent ors a computing professional who has Lawler Award for Humanitarian Contri-
group of technology visionaries recog- made a major technical or service contri- butions within Computer Science and
nized with the world’s highest distinc- bution by the age of 35. This year, two in- Informatics is Meenakshi Balakrishnan
tion in computing. dividuals are being recognized: Michael for developing cost-effective solutions to
The Turing Award is one of a suite of J. Freedman for the design and deploy- address the special mobility and educa-
professional honors ACM bestows annu- ment of self-organizing peer-to-peer sys- tion challenges of the visually impaired
ally to recognize technical achievements tems; and Constantinos Daskalakis for in developing countries. The ACM-AAAI
that have made significant contribu- his contributions to complexity and Allen Newell Award, presented to an in-
tions to our field. This month, I will have game theory. dividual for career contributions that
the pleasure of joining the awardees, Gerald C. Combs is being recognized have breadth within CS or that bridge CS
ACM Fellows, and other luminaries in with the ACM Software System Award, and other disciplines, has been awarded
San Francisco for the ACM Awards Ban- given to an institution or individual(s) to Henry Kautz for his work at the inter-
quet. The annual event pays tribute to for developing a software system of last- section of AI, computational social sci-
computing excellence and to those ing influence. He created the WireShark ence, and public health.
whose contributions and innovations network protocol analyzer, used by prac- Last but not least, the Awards Ban-
have had a lasting impact on our field. titioners and researchers worldwide to quet will celebrate 56 incoming ACM
Among the new honorees is Shwetak analyze and troubleshoot a wide range of Fellows. A complete list of names and
Patel, winner of the ACM Prize in Com- network protocols. The 2019–2020 ACM their key achievements can be found at
puting. This award recognizes individu- Athena Lecturer Award, a biennial honor https://awards.acm.org/fellows.
als who have made significant contribu- celebrating fundamental CS contribu- The prestige of ACM’s awards brings
tions during the early years of their tions by women researchers, goes to Eli- global attention to outstanding techni-
careers. Patel is being honored for his sa Bertino in recognition of her ground- cal and professional achievements
innovative work in applying sensor sys- breaking work in data security and throughout the computing community.
tems to problems of sustainability and privacy. Chelsea Finn from UC Berkeley We all benefit when fine work and last-
health care. Also on hand will be Men- receives the ACM Doctoral Dissertation ing accomplishments in computer sci-
del Rosenblum, being honored as the Award for her work on “Learning to ence are celebrated. I hope you will par-
first winner of the ACM Charles P. Learn with Gradients.” ticipate this coming year, by making sure
“Chuck” Thacker Breakthrough in The ACM Distinguished Service the key achievers in your own area are
Computing Award. This new biennial Award, which celebrates service contri- nominated. Our award committees, led
award recognizes individuals whose butions to the computing community, by Awards Co-Chairs John White and
work exemplifies “out-of-the-box” goes to Paramir (Victor) Bahl, for his work Vinton Cerf, do an outstanding job, but
thinking. Rosenblum’s work echoes founding conferences, publications, they rely on people like you to identify
Thacker’s trademark can-do approach: and a SIG for researchers and practitio- and put forward strong candidates.
he reinvented the virtual machine con- ners in the mobile and wireless network- Learn more at https://awards.acm.org/
cept, thereby revolutionizing datacen- ing community, as well as contributions award-nominations.
ters and making today’s cloud comput- to technology policy. Robert Sedgewick
ing possible. is being honored with the ACM Karl V. Cherri M. Pancake is President of ACM, professor emeritus
of electrical engineering and computer science, and director
Pavel Pevzner receives the ACM Paris Karlstrom Outstanding Educator Award of a research center at Oregon State University, Corvallis,
Kanellakis Theory and Practice Award, for the outstanding textbooks and on- OR, USA.

recognizing theoretical advances that line materials he created, which are used Copyright held by author/owner..

JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF THE ACM 5


ACM-IMS Data Science Summit
June 15, 2019 | Palace Hotel, San Francisco

An interdisciplinary event bringing together Keynote Speakers


researchers and practitioners to address deep
learning, reinforcement learning, robustness,
fairness, ethics, and the future of data science.

Computing and statistics underpin the rapid emergence of data


science as a pivotal academic discipline. ACM and IMS—the Institute of
Mathematical Statistics—the two key academic organizations in these
areas, have launched a new joint venture to propel data science and to Jeffrey Dean
engage and energize our communities to work together.
Google
ACM and IMS will hold an all-day launch event to address topics such as
deep learning, reinforcement learning, fairness, and ethics, in addition to
discussions about the future of data science and the role of ACM and IMS.

Panels and Panelists


Deep Learning, Reinforcement Learning, and Role of Methods
in Data Science
• Shirley Ho, Flatiron Institute
• Sham Kakade, University of Washington David Donoho
• Suchi Saria, Johns Hopkins University Stanford University
• Manuela Veloso, J.P. Morgan, Carnegie Mellon University
Robustness and Stability in Data Science
• Aleksander Madry, Massachusetts Institute of Technology
• Xiao-Li Meng, Harvard University
• Richard J. Samworth, University of Cambridge, Alan Turing Institute
• Bin Yu, University of California, Berkeley
Fairness and Ethics in Data Science
• Alexandra Chouldechova, Carnegie Mellon University
• Andrew Gelman, Columbia University Daphne Koller
• Kristian Lum, HRDAG (Human Rights Data Analysis Group) insitro
Future of Data Science Stanford University
• Michael I. Jordan, University of California, Berkeley
• Adrian Smith, Alan Turing Institute

Seating is limited, so register early!


https://www.acm.org/data-science-summit
cerf’s up

DOI:10.1145/3328904 Vinton G. Cerf

Back to the Future it and if the packet was not destined


for a locally connected computer, it
was stored briefly until it reached the
First, allow me to congratulate all the ACM head of the line in a queue whereup-
honorees that receive their well-deserved on it was then forwarded to the next
hop (packet switch) along a path to
awards this month at the ACM Awards Gala in the destination.
San Francisco. For an account of the awards This was a much faster process
than the old manual telegraph meth-
od and the forwarding of the packets
this year, please read ACM President book entitled The Victorian Internet by allowed the concurrent sharing/mul-
Cherri Pancake’s summary on p. 5 of Tom Standageb that outlines the his- tiplexing of the dedicated telephone
this issue. tory of the telegraph. circuit between the packet switches.
I want to take you back to the mid- Eventually, circuit-switching sys- There was no waiting to set up a dialed
1800s, as the telegraph emerged as a tems derived from the telephone net- circuit. The same circuit could carry
nearly fast-as-light communication work could be used to connect the many packets going to many destina-
technology. You can imagine the ex- source and destination teletypes di- tions without setting up and tearing
citement when in 1844 Samuel Morse rectly to each other without the need down circuits. Because all the traf-
sent his first message between Wash- for intermediate hops, just as voice fic was split into packets, long files
ington, D.C., and Baltimore, MD. Now calls are made. A circuit was set up would be easily mixed in with other
you could send messages faster than and the sending teletype would trans- traffic, reducing the latency for access
even a speeding train. If the bad guy mit its paper tape and the receiving to the common communication net-
robbed a bank and jumped on a train teletype would punch it out at the work. With increasingly fast dedicat-
to escape, you could signal the next other end. ed circuits, the latencies end-to-end
station to have the police ready to Ironically, the packet switching of dropped and capacity went up leading
nab the miscreant before the train ar- the Arpanet reintroduced the store- to the streaming audio, video, and in-
rived. The successful laying of a trans- and-forward method for intercom- teractive videoconferencing and gam-
Atlantic cable in 1866 (earlier trials puter communication. Dedicated cir- ing so prevalent today.
failed in short order) was another cuits connected the packet switches In March 2019 issue of Communi-
major milestone. Then, in 1901, Gug- just as the old telegraph sets were con- cations there is an important article
lielmo Marconi came along and did it nected. When a packet was received, by Pamela Zave and Jennifer Rexfordc
without wires! the receiving packet switch examined that reenvisioned the current Inter-
These systems worked by “store and net as a recursively layered network
forward” since telegrams were sent b Standage, T. The Victorian Internet. Walker and of networks of networks (so to speak)
from station to station, being manu- Company, 1998. that captures the evolved architecture
ally copied and retransmitted “hop by now manifest. We have come a long
hop.” Eventually there were paper tape way since the 1844 introduction of the
teletypes that would punch out a tape The packet switching telegraph and the 1983 activation of
with the message characters encoded the Internet and there is strong evi-
with 5-bit Baudot codes for each letter. of the Arpanet dence that further evolution is to be
The transmitting teletype read the tape reintroduced expected as new technologies arrive
and sent the characters to a receiving to spark imagination and challenge
teletype that would punch out a dupli- the store-and- engineers to improve on the past.
cate tape. The operator would hang the forward method c Zave, P.A. and Rexford, J. The compo-
tape on a peg next to the machine that
would be used to forward this message for intercomputer sitional architecture of the Internet.

communication.
Commun. ACM 62, 3 (Mar. 2019), 78–87.
to the next hop.a There is a wonderful
Vinton G. Cerf is vice president and Chief Internet Evangelist
a This was sometimes called “torn tape” tele- at Google. He served as ACM president from 2012–2014.
communication because you would tear the
tape off the receiving teletype. Copyright held by author.

JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF THE ACM 7


The Communications Web site, http://cacm.acm.org,
features more than a dozen bloggers in the BLOG@CACM
community. In each issue of Communications, we’ll publish
selected posts or excerpts.

Follow us on Twitter at http://twitter.com/blogCACM

DOI:10.1145/3323684 http://cacm.acm.org/blogs/blog-cacm

Is CS Really for All, computer science wasn’t for everyone,


that only elite students could succeed in

and Defending
computer science. Eric writes, at https://
stanford.io/2ODJ4OK:
The imposition of GPA thresholds and

Democracy in other strategies to reduce enrollment led


naturally to a change in how students per-
ceived computer science. In the 1970s, stu-

Cyberspace dents were welcomed eagerly into this new


and exciting field. Around 1984, every-
thing changed. Instead of welcoming stu-
Mark Guzdial mulls the difficulty of getting into a computer science dents, departments began trying to push
class, while John Arquilla ponders political warfare in cyberspace. them away. Students got that message and
concluded that they weren’t wanted. Over
the next few years, the idea that computer
Mark Guzdial Everyone is trying to figure out how to science was competitive and unwelcoming
The Growing increase capacity in undergraduate com- became widespread and started to have
Tension Between puter science education. CRA-E main- an impact even at institutions that had
Undergraduate and tains a list of successful practices for scal- not imposed limitations on the major.
K–12: Is CS for All, ing capacity in CS enrollment, many of Unlike the 1980s, we now have a na-
or Just Those Who Get which were funded by Google (see http:// tional movement in the U.S. that wants
Past the Caps? bit.ly/2FUpIBd). The New York Times ar- “CS for All” (https://www.csforall.org/).
February 3, 2019 ticle describes how CS departments are Primary and secondary schools are in-
http://bit.ly/2HQZhQe responding to the greater demand than creasing access to CS classes. States
The New York Times recently ran an ar- supply in CS classes. We are seeing caps and school districts are mandating
ticle titled “The Hard Part of Computer on enrollment, GPA requirements, ra- computer science for all students.
Science? Getting Into Class” (https:// tions, and even lotteries to allocate the We are facing a capacity crunch in un-
nyti.ms/2VaWcNR) about the dramatic scarce resource of a seat in a CS class. dergraduate CS classes, and we are not
increase in undergraduate enrollment, We may be approaching an inflec- even close to CS for all. While an increas-
and the inability of U.S. computer science tion point in computing education— ing number of U.S. schools are offering
(CS) departments to keep pace with the and maybe it’s one we’ve seen before. CS classes, only a small percentage of
demand. These facts aren’t a surprise. Eric Roberts of Stanford has written students are taking them up on the offer.
The Computing Research Association a history of undergraduate CS enroll- Data coming out of U.S. states suggests
report “Generation CS” (https://cra. ments dating back over 30 years (https:// that less than 5% of U.S. high school
org/data/generation-cs/) described the stanford.io/2CNWa7f). He suggests the students take any computer science,
doubling and tripling of CS undergradu- downturn in enrollment in the late for example, less than 1% in Georgia or
ate enrollment at U.S. institutions from 1980s may have been the result of CS Indiana (see state reports at http://bit.
2006 to 2015. American academia took departments’ inability to manage ris- ly/2Uk3QZ9). What happens to under-
notice with the 2017 National Acad- ing CS enrollments in the early 1980s. graduate CS enrollment if we get up to
emies report on the rapid growth of CS Then, as now, caps and limits were put 10% of high school students taking com-
enrollments (http://bit.ly/2CWttnt). into place, which sent the message that puter science, and even a small percent-

8 COMMUNICATIO NS O F THE ACM | J U NE 201 9 | VO L . 62 | NO. 6


blog@cacm

age of those students decide they want control of the spigot by traditional post- ventions. It is well past time to return
to take post-secondary computer sci- secondary arrangements is part of the to this important idea.
ence classes? What if we get past 50%? problem now, and also later if the “demand” The other way for democracies to take
I don’t have a prediction for what decreases for whatever reasons. Having the sting out of political warfare waged
happens next. I don’t know if we’ve ever excess capacity on hand, and some way to from cyberspace is to clean up their own
had this kind of tension in American redirect it, is not the kind of resiliency we practices, which in too many countries
education. On the one hand, we have a afford educational institutions. have descended into outrageous spirals
well-funded, industry-supported effort —Dennis Hamilton of distortion and lying. What foreign ac-
to get CS into every primary and second- tors are doing pales next to what is being
ary school in the U.S. (https://code.org/ John Arquilla done by the very political parties and citi-
about/donors). Some of those kids are In (Virtual) Defense zens of democratic nations now crying
going to want more CS in college or uni- of Democracy “foul” because some other is in the game.
versity. On the other hand, we see post- March 19, 2019 The world should look to America’s Ron-
secondary schools putting the brakes http://bit.ly/2U9mtj6 ald Reagan, who back in the 1980s waged
on rising enrollment. Community col- In February, The New some of the cleanest political campaigns
leges and non-traditional post-second- York Times reported that disruptive cy- in memory. It will not be easy to stop in-
ary education may take up some of the ber operations were launched against dividuals from becoming bad political
demand, but they probably can’t grow the Russia-based Internet Research actors in cyberspace, but the major polit-
exponentially either. Like the 1980s, CS Agency during the 2018 elections in the ical parties should set an example—and
departments have no more resources U.S. These operations took two forms: an implied moral norm—by rising to the
to manage growing enrollment—but direct action causing brief shutdowns, challenge of focusing on fact- and issue-
there is even more pressure than in the and messages to suspected malefac- based election campaigns.
1980s to increase capacity. tors that sought to deter. The intended One last thought: the U.S. has to be
The greatest loss in the growing de- goal of these actions was to “protect careful about condemning others for
mand for CS classes is not that there American democracy.” engaging in interventions into its po-
will be a narrower path for K–12 stu- Neither form of action will prove ef- litical processes. As Dov Levin pointed
dents to become professional software fective over time. Election propaganda- out in a study conducted while he was a
developers. As the Generation CS re- by-troll can come from myriad sources postdoctoral fellow at Carnegie Mellon,
port (http://bit.ly/2Udzecn) showed, a and surrogates, easily outflanking clum- from 1946–2000 the U.S. intervened in
big chunk of the demand for seats in CS sy efforts to establish some sort of “infor- 81 foreign elections. The number for
courses is coming from CS minors and mation blockade.” As to deterrence, this Russia over the same period was 36.
from non-CS majors. More and more is an old chestnut of the age of nation- Some have defended American actions
people are discovering that computer states. Hacker networks will almost by saying that it is okay to intervene
science is useful, in whatever career surely not be intimidated, whether they when your goal is to shore up liberal
they pursue. Those are the people who are working on their own or at the be- forces against authoritarians. But this
are losing out on seats. Maybe they first hest of a malign third party. Indeed, in kind of reasoning can be used by those
saw programming in K–12 and now the future, election hackers are far more who attempted to influence the 2016
want some more. That’s the biggest likely to ramp up efforts to shape elec- presidential election in the U.S.; they
cost of the capacity crisis. In the long toral discourses and outcomes—in de- can say that by “outing” the Democratic
run, increasing computational literacy mocracies everywhere. Party’s backroom efforts to undermine
and sophistication across society could How, then, can this threat be ap- Senator Bernie Sanders’ campaign,
have even bigger impact than produc- propriately countered? There are they were serving the true foundation
ing more professional programmers. two ways—to date, neither of which of democracy: free and fair processes.
Inability to meet the demand for has been chosen. The first has to do Political discourse in cyberspace is a
seats in CS classes may limit the growth with seeking, via the United Nations, fact of life now, and it will remain so for
in our computing labor force. It may an “international code of conduct” the foreseeable future in democratic na-
also limit the growth of computational (ICC) in cyberspace that would impose tions. There are two ways to proceed, if
scientists, engineers, journalists, and behavior-based constraints on both the trolls are to be tamed. One involves
teachers—in short, a computationally infrastructure attacks and “political multilateral action via the United Na-
literate society. warfare.” Ironically, it is the Russians tions; the other demands an inward-
who have been proposing an ICC for looking devotion—among the political
Comments more than 20 years now—while the class and at the individual level—to cul-
It strikes me that nontraditional learning American position has been in firm op- tivating the better angels of our cyber
may be able to take up some of the position—beginning shortly after the natures. Both are worth pursuing.
slack. That won’t address the desire for first meeting between U.S. and Russian
conventional credentialing. I am not certain cyber teams. I co-chaired that meet- Mark Guzdial is a professor in the Computer Science &
Engineering Division of the University of Michigan. John
how that serves folks preparing themselves ing, and thought the Russians had Arquilla is Distinguished Professor of Defense Analysis at
for non-CS disciplines in which some proposed a reasonable idea: creating a the United States Naval Postgraduate School; the views
expressed are his alone.
computation grounding/experience is sought. voluntary arms control regime, like the
Just the same, I wonder if the current chemical and biological weapons con- © 2019 ACM 0001-0782/19/6 $15.00

JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF THE ACM 9


N
news

Turing Profile | DOI:10.1145/3323872 Neil Savage

Neural Net Worth


Yoshua Bengio, Geoffrey Hinton, and Yann LeCun this month
will receive the 2018 ACM A.M. Turing Award for conceptual
and engineering breakthroughs that have made deep neural
networks a critical component of computing.

W
HEN GEOFFREY HINTON the right answer was just asking too
started doing gradu- much. “People were very suspicious of
ate student work on the idea you could just learn from the
artificial intelligence data,” says Hinton, a professor emeri-
at the University of Ed- tus at the University of Toronto and
inburgh in 1972, the idea that it could now an engineering fellow at Google.
be achieved using neural networks that LeCun read Hinton’s work includ-
mimicked the human brain was in dis- ing, he says, a paper written in coded
repute. Computer scientists Marvin language to get around the taboo about
Minsky and Seymour Papert had pub- neural nets. “I learned about Geoff’s
lished a book in 1969 on Perceptrons, existence, and realized this was the
an early attempt at building a neural man I needed to meet,” he says. LeCun
net, and it left people in the field with did a postdoctoral fellowship in Hin-
the impression that such devices were ton’s lab, then moved to Bell Labs. He’s
nonsense. now a professor at New York University
“It didn’t actually say that, but that’s (NYU) and director of AI research at
how the community interpreted the Facebook.
book,” says Hinton who, along with Yo- Bengio also wound up at Bell Labs
shua Bengio and Yann LeCun, will re- in the early 1990s, where he and Le-
ceive the 2018 ACM A.M. Turing award cun worked together. “What really ap-
for their work that led deep neural net- pealed to me was the notion that by
works to become an important com- studying neural nets, I was studying
ponent of today’s computing. “People something that would be fairly general
thought I was just completely crazy to about intelligence, that would explain
be working on neural nets.” our intelligence and allow us to build
Even in the 1980s, when Bengio and intelligent machines,” Bengio recalls.
LeCun entered graduate school, neural Today, he is a professor at the Univer-
PHOTO BY A LEXA NDER BERG

nets were not seen as promising. Many sity of Montreal, scientific director of
people thought that building a net- Mila (the Montreal Institute for Learn-
work with random connections across ing Algorithms), and an advisor to Mi- From left, Yoshua Bengio,
Geoffrey Hinton, and Yann LeCun
multiple layers, giving it some data, crosoft. at the Vector Institute for Artificial
and letting it figure out how to reach Their work gained wide mainstream Intelligence in Toronto, Canada.

10 COMMUNICATIO NS O F TH E AC M | J U NE 201 9 | VO L . 62 | NO. 6


news
news

acceptance in 2012, after Hinton and examples. But to give machines a more
two students used deep neural nets to general intelligence that could solve
win the ImageNet challenge, identify- “Machines are still different types of problems or accom-
ing objects in a set of photos at a rate very, very stupid,” plish multiple tasks will require sci-
far better than that of any of their com- entists to come up with new concepts
petitors. Since then, the field has em- LeCun says. about how learning works, Bengio
braced the technology, which has also “The smartest AI says. “It might take a very long time be-
seen breakthroughs in speech recogni- fore we reach human-level AI,” he says.
tion and natural language processing, systems today have Meanwhile, society has to have
and could help make self-driving ve- less common sense more discussion about how to use ar-
hicles more reliable. tificial intelligence appropriately. Hin-
LeCun says theories about why than a house cat.” ton worries about how autonomous
neural nets would not work—that the intelligent weapons systems might be
training algorithms would get stuck misused, for instance. LeCun says that
in the extreme values of mathematical without adequate political and legal
functions known as local minima—fell protections, governments could use
to real-world experience. “In the end, the systems to track people and try to
what people were convinced by were Bengio came up with word embed- control their behavior, or corporations
not theorems; they were experimental dings, patterns of neuron activation might rely on AI to make decisions but
results,” he says. Even though there that represent word symbols, thereby ignore bias in their algorithms.
were local minima, those bad enough expanding exponentially the system’s To address some of these worries,
for an optimization algorithm to get ability to express meanings and mak- Bengio took part in a group that last De-
stuck were relatively rare. It turned ing it possible to process text and trans- cember issued the Montreal Declara-
out that if the neural nets were just big late it from one language to another. tion for a Responsible Development of
enough for the problem they were try- Hinton explains that the embeddings Artificial Intelligence, which outlines
ing to solve, they could get stuck, but make it easier for the system to reason principles that they say should be used
if they were larger, they became more by analogy, rather than by following a in pushing the technology forward.
efficient at optimization. “You make logical set of rules; he believes that is “We’re building stronger and stronger
those networks bigger and bigger and more like how the human brain works. technology based on the premises of
they work better and better,” LeCun The brain evolved to use patterns of science, but the organization of society
says. neural activity to perform perception and their collective wisdom isn’t keep-
Working both together and inde- and movement, and that makes it more ing up fast enough. The solution may
pendently, the three made important suited to reasoning by analogy rather not be in some new theorem or some
contributions to neural networks. than logic, he argues. new algorithm,” he says.
Among their several discoveries, Hin- In fact, artificial intelligence re- With such concerns in mind, Hin-
ton helped to develop backpropaga- mains limited compared to human in- ton says he will donate a portion of his
tion, an algorithm that calculates error telligence. “Machines are still very, very share of the $1-million Turing Award
at the output of the network and propa- stupid,” LeCun says. “The smartest AI prize money to the humanities at the
gates the results backward toward the systems today have less common sense University of Toronto. “If we have sci-
input, allowing the machine to improve than a house cat.” Though they excel ence without the humanities to help
its accuracy. LeCun developed convolu- at recognizing patterns, neural net- guide the political process, then we’re
tional neural networks, which replicate works have no knowledge of how the all in trouble,” he says. LeCun says he
feature detectors across space and are world works, and computer scientists will likely make a donation to NYU, and
more efficient for image and speech have not yet figured out how to give it Bengio says he’s considering some en-
recognition. to them. Humans learn to generalize vironmental causes.
Another development that helps the from a very small number of samples, Based on their experiences as aca-
system learn more effectively involves while neural networks require vast sets demic heretics who turned out to be
randomly turning off some of the neu- of training data. In fact, Hinton says, it right, they advise young computer sci-
rons about half of the time, introduc- was the growth in available datasets, entists to stick to their convictions. “If
ing some noise into the network. Ben- along with faster processors, that led someone tells you your intuitions are
gio says there is noise and randomness to the “phase shift” from neural net- wrong, there are two possibilities,”
in the way living neurons spike, and works being a curiosity to a practical Hinton says. “One is you have bad in-
something about that makes the sys- approach. tuitions, in which case it doesn’t mat-
tem better at dealing with variations There are hundreds of useful tasks ter what you do, and the other is you
in input patterns, which is key to mak- neural networks can accomplish just have good intuitions, in which case you
ing the system useful. “You want to be by using their current pattern recog- should follow them.”
good at doing the things you haven’t nition capabilities, Hinton says, from
yet seen, things that might be some- predicting earthquake aftershocks to Neil Savage is a science and technology writer based in
Lowell, MA, USA.
what different from the training data,” offering better medical diagnoses on
Hinton says. the basis of hundreds of thousands of © 2019 ACM 0001-0782/19/6 $15.00

12 COM MUNICATIO NS O F TH E ACM | J U NE 201 9 | VO L . 62 | NO. 6


news

Science | DOI:10.1145/3323685 Gary Anthes

Lifelong Learning
in Artificial Neural Networks
New methods enable systems to rapidly, continuously adapt.

O
V E R T H E PA S Tdecade, ar-
tificial intelligence (AI) Summary of General L2M Framework
based on machine learn-
ing has reached break-
through levels of per-
formance, often approaching and
sometimes exceeding the abilities of
human experts. Examples include im-
age recognition, language translation,
and performance in the game of Go.
These applications employ large
artificial neural networks, in which
nodes are linked by millions of weight-
ed interconnections. They mimic
the structure and workings of living
brains, except in one key respect—
they don’t learn over time, as animals
do. Once designed, programmed, and
trained by developers, they do not
adapt to new data or new tasks with-
out being retrained, often a very time-
consuming task.
Real-time adaptability by AI sys-
tems has become a hot topic in re-
search. For example, computer sci-
entists at Uber Technologies last year
The DARPA Lifelong Learning Machines (L2M) Program seeks to develop learning systems
published a paper that describes a that continuously improve with additional experience, and rapidly adapt to new conditions
method for introducing “plasticity” and dynamic environments.
in neural networks. In several test
applications, including image rec- For more than 60 years, neural with labeled examples. This training
ognition and maze exploration, the networks have been built from in- is most often done via a method called
researchers showed that previously terconnected nodes whose pair-wise backpropagation, in which the sys-
trained neural networks could adapt strength of connection is determined tem calculates an error at the synaptic
to new situations quickly and effi- by weights, generally fixed by training output and distributes it backward
INFOGRA PHIC COU RTESY OF DA RPA’ S ELEC TRONIC S RESURGENCE INIT IATIVE

ciently without undergoing addition- throughout the networks layers. Most


al training. deep learning systems today, includ-
“The usual method with neural “In a few years, ing Miconi’s test systems, use back-
networks is to train them slowly, with propagation via gradient descent, an
many examples; in the millions or much of what we optimization technique.
hundreds of millions,” says Thomas consider AI today Using that as a starting point, Mi-
Miconi, the lead author of the Uber coni employs an idea called Hebbian
paper and a computational neurosci- won’t be considered learning, introduced in 1949 by neu-
entist at Uber. “But that’s not the way AI without lifelong ro-psychologist Donald Hebb, who
we work. We learn fast, often from a observed that two neurons that fire re-
single exposure, to a new situation or learning.” peatedly across a synapse strengthen
stimulus. With synaptic plasticity, the their connection over time. It is often
connections in our brains change au- summarized as, “Neurons that fire to-
tomatically, allowing us to form mem- gether, wire together.”
ories very quickly.” With this “Hebbian plasticity,”

JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 13
news

networks employ a kind of “meta-


learning”—in essence, they learn
how to learn—based on three con- DARPA Projects in Lifelong
Learning Machines
ceptually simple parameters. Pairs
of neurons have the traditional fixed
weights established during the train-
ing of the system. They also have a Columbia University is learning how to build and train self-aware neural networks,
systems that can adapt and improve by using internal simulations and knowledge
plastic weight called a Hebbian trace, of their own structures.
which varies during a lifetime accord-
The University of California, Irvine, is studying the dual memory architecture of the
ing to the actual data it encounters. hippocampus and cortex to replay relevant memories in the background, allowing the
These Hebbian traces can be comput- systems to become more adaptable and predictive while retaining previous learning.
ed in different ways, but in a simple
Tufts University is examining an intercellular regeneration mechanism observed in lower
example it is the running average of animals such as salamanders to create flexible robots capable of adapting to changes in
the product of pre- and post-synaptic their environment by altering their structures and functions on the fly.
activity. SRI International is developing methods to use environmental signals and their
The Hebbian traces are themselves relevant context to represent goals in a fluid way rather than as discrete tasks,
weighted by a third fixed parameter, enabling AI agents to adapt their behavior on the go. —Gary Anthes
called the plasticity coefficient. Thus,
at any moment, the total effective
weight of the connection between two Bar-Joseph, a computational biologist ment in large, mainstream applica-
neurons is the sum of the fixed weight at Carnegie Mellon University who tions of AI, he says.
plus the Hebbian trace multiplied by was not involved in the work at Uber. With most large AI systems today,
the plasticity coefficient. Depending “They have taken a principle from bi- Bar-Joseph says, “You optimize, and
on the values of these three param- ology that was well known and shown optimize, and optimize, and that’s it.
eters, the strength of each connection it can have a positive impact on an ar- If you get new data, you can retrain it,
can be completely fixed, completely tificial neural network.” However, it is but you are not trying to adapt to new
variable, or anything in between. too early to say whether the method things.” For example, he says, a neural
“This is important work,” says Ziv will represent an important advance- net might have been trained to give

ACM News

The Trouble with SMS Two-Factor Authentication


Many use SMS two-factor theft. SIM swapping is when may differ and could have security key then is inserted
authentication (2FA) on their a hacker goes into a phone vulnerabilities,” he says. into your phone to complete the
smartphones to secure their store pretending to be you, and “Codes sent over the verification process. Risher says
online accounts, but not everyone convinces a staff member to Internet almost always have the firmware in the security keys
understands its potential port your SIM card information at least some risk of being has been “sealed permanently into
vulnerabilities. to a phone they own. The stolen,” says Mark Risher, a secure element hardware chip at
You’ve probably seen SMS hackers then either convince Google director of product production time and is designed
2FA in action. An online account, the original owner to fork over management for counter-abuse to resist physical attacks aimed at
upon login, prompts you to login details, using the swapped and identity services. “Any extracting firmware and secret key
receive a second code on your SIM to intercept the SMS 2FA form of 2FA improves user material.”
phone via text message. You code sent after logging in, or security over a password alone; Another potential solution
receive the second code, then they attempt to reset account however, not all 2FA provides is Kaspersky’s fraud prevention
enter it to confirm that you passwords, using the swapped equal protection. Sophisticated platform, which leverages
are the legitimate user of the SIM to intercept the code sent to attacks can work around some machine learning and
account, and not a hacker. confirm they are the legitimate methods of 2FA.” “continuous analysis of hundreds
Yet SMS 2FA can be hacked, too. account owners. Risher cites SMS-based of parameters in real time” to
In late 2018, Amnesty In July 2018, a suspect was phishing attacks as one such assess if a user is legitimate. Says
International reported hackers arrested for SIM swapping for method. “Despite this, adding Daschenko, “During the whole
had hijacked 2FA codes and the first time, according to a phone number for two-step session, [the system] is analyzing
compromised online accounts; crypto/blockchain media outlet verification is still recommended the behavioral and biometric
malicious actors had recreated CoinTelegraph. The perpetrator if you can’t use any other data, device reputation, and other
the websites of legitimate allegedly stole $5 million options,” he notes. nonpersonalized information to
services to convince users to in cryptocurrency using the The good news is there are detect any signs of abnormal or
reveal their 2FA authentication technique. other options. suspicious behavior.”
codes. SMS 2FA has vulnerabilities, One is Google’s own Titan That is certainly an
SIM swapping is used but these are not necessarily Security Key, a physical key improvement over relying on
by hackers to gain access to flaws in how it is designed, says developed using the open source SMS 2FA alone.
sensitive accounts “protected” Kaspersky Lab security researcher security standard FIDO. When you —Logan Kugler is a freelance
by SMS 2FA, which has resulted Vladimir Dashchenko. “In general, log into Google services, the SMS technology writer based in Tampa,
in hundreds of millions of 2FA itself is a secure concept. 2FA code is sent to the security key FL, USA. He has written for over 60
dollars in cryptocurrency Yet, the ways it is implemented instead of your phone; the physical major publications.

14 COMMUNICATIO NS O F TH E AC M | J U NE 201 9 | VO L . 62 | NO. 6


news

highly accurate results when classify- of DARPA’s L2M program and a com- been a goal of AI researchers for many
ing different kinds of automobiles, puter science professor at the Univer- years, but major advancements have
but when a new kind of car (a Tesla, sity of Massachusetts, Amherst. “We only recently become feasible, en-
say) is seen, the system stumbles. will never be safe in a self-driving car abled by advancements in computer
“You want it to recognize this new without it,” she says. But it is just one power, new theoretical foundations
car very quickly, without retraining, of many necessary steps toward that and algorithms, and a better under-
which can take days or weeks. Also, goal. “It’s definitely not the end of the standing of biology. “In a few years,
how do you know that something new story,” she says. much of what we call AI today won’t be
has happened?” There are five “pillars” of lifelong considered AI without lifelong learn-
Artificial intelligence systems that learning as DARPA broadly defines ing,” she predicts.
learn on the fly are not new. In “neu- it, and synaptic plasticity falls into Miconi’s team is now working on
roevolution,” networks update them- the first of these. The pillars are: con- making learning more dynamic and
selves by algorithms that employ a tinuous updating of memory, without sophisticated than it is in his test sys-
trial-and-error method to achieve a catastrophic forgetting; recombinant tems so far. One way to do that is to
precisely defined objective, such as memory, rearranging and recom- make the plasticity coefficients, now
winning a game of chess. They require bining previously learned informa- fixed as a design choice, themselves
no labeled training examples, only tion toward future behavior; context variable over the life of a system. “The
definitions of success. “They go only awareness and context based modula- plasticity of each connection can be
by trial and error,” says Uber’s Miconi. tion of system behavior; adoption of determined at every point by the net-
“It’s a powerful, but a very slow, es- new behaviors through internal play, work itself,” he says. Such “neuro-
sentially random, process. It would self-awareness, and self-simulations; modulation” likely occurs in animal
be much better if, when you see a new and safety and security, recognizing brains, he says, and that may be a key
thing, you get an error signal that tells whether something is dangerous and step toward the most flexible decision-
you in which direction to alter your changing behavior accordingly, and making by AI systems.
weights. That’s what backpropagation ensuring security through a combina-
gets you.” tion of strong constraints.
Further Reading
Siegelmann cites smart prosthe-
Military Apps ses as an example of an application of Chang, O. and Lipson, H.
Miconi’s ideas represent just one of these techniques. She says the control Neural Network Quine,
Data Science Institute, Columbia
a number of new approaches to self- software in an artificial leg could be
University, New York, NY 10027, May 2018
learning in AI. The U.S. Department of trained via conventional backpropa- https://arxiv.org/abs/1803.05859v3
Defense is pursuing the idea of synap- gation by its maker, then trained to
Chen, Z. and Liu, B.
tic plasticity as part of a broad family the unique habits and characteristics Lifelong Machine Learning, Second Edition,
of experimental approaches aimed at of its user, and finally enabled to very Synthesis Lectures on Artificial Intelligence
making defense systems more accu- quickly adapt to a situation it has not and Machine Learning, August 2018
rate, responsive, and safe. The U.S. seen before, such as an icy sidewalk. https://www.morganclaypool.
Defense Advanced Research Projects A computational neuroscientist, com/doi/10.2200/
S00832ED1V01Y201802AIM037
Agency (DARPA) has established a Siegelmann says lifelong learning has
Lifelong Learning Machines (L2M) Hebb, D.
The Organization of Behavior: A
program with two major thrusts, one Neuropsychological Theory, New York:
focused on the development of com- DARPA’s Wiley & Sons, 1949
plete systems and their components, http://s-f-walker.org.uk/pubsebooks/pdfs/The_
and the second on exploring learning Lifelong Learning Organization_of_Behavior-Donald_O._Hebb.pdf
mechanisms in biological organisms Machines program Miconi, T., Clune, J., and Stanley, K.
Differentiable Plasticity: Training Plastic
and translating them into computa-
tional processes. The goals are to en- does not seek Neural Networks with Backpropagation,
able AI systems to “learn and improve incremental Proceedings of the 35th International
Conference on Machine Learning (ICML
during tasks, apply previous skills
and knowledge to new situations, in- improvements, 2018), Stockholm, Sweden, PMLR 80, 2018
https://arxiv.org/abs/1804.02464
corporate innate system limits, and “but rather Miconi, T.
enhance safety in automated assign-
ments,” DARPA says at its website. paradigm-changing Backpropagation of Hebbian Plasticity
for Continual Learning,
“We are not looking for incremental approaches to NIPS Workshop on Continual Learning, 2016
https://github.com/ThomasMiconi/
improvements, but rather paradigm-
changing approaches to machine machine learning.” LearningToLearnBOHP/blob/master/paper/
abstract.pdf
learning.”
Uber’s work with Hebbian plastic- Gary Anthes is a technology writer and editor based in
ity is a promising step toward lifelong Arlington, VA, USA

learning in neural networks, says Hava


Siegelmann, founder and manager © 2019 ACM 0001-0782/19/6 $15.00

JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 15
news

Technology | DOI:10.1145/3323703 Don Monroe

And Then,
There Were Three
How long can the silicon foundry sector continue to adapt,
as physical limits make further shrinkage virtually impossible?

R
ELENTLESS YEAR-OVER-YEAR
IMPROVEMENTS in integrated
circuits don’t come cheap.
For years, these advances
have been boosted in part
by silicon foundries that invest in new
technology by aggregating demand from
design companies that don’t have facto-
ries of their own. As of last summer, how-
ever, only one such “pure-play” foundry
continues to pursue the latest silicon
generation, along with two companies
that also make their own chips. The
dwindling of suppliers revives the long-
standing question of how the industry
can adapt as physical limits eventually
make further shrinkage impossible (or
impossibly expensive).
Still, the story sounds familiar. “Ev-
ery time people say Moore’s Law has
finally hit the wall, people come up
with new, innovative approaches to get
around it,” said Willy Shih, Robert and
Jane Cizik Professor of Management
Practice at Harvard Business School.
The silicon industry has tracked
the 1965 observation by Gordon Unfortunately, exponentially in- announced in August 2018 that it was
Moore, co-founder and later head of creasing transistor counts were accom- halting development of its 7nm pro-
Intel, that transistor counts were dou- panied by corresponding increases in cess, “it was quite a shocker for a lot of
bling every year (later changed to every the costs to build fabrication plants people,” Shih said. The foundry compa-
two years). This exponential growth and develop more aggressive process- ny had originally projected risk produc-
became enshrined as a “law,” which es and novel device structures. These tion—early manufacture with relaxed
became a collective self-fulfilling costs, and the need to keep the expen- quality guarantees—of 7nm products in
prophesy as companies feared losing sive equipment in constant use, have spring 2018, and until recently seemed
business if they fell behind its aggres- long made it almost impossible for a committed. Now, the only remaining
sive schedule. Successive generations smaller company to manufacture a nov- pure-play foundry developing leading-
were labelled by an ever-shrinking dis- el chip design itself. “The capital invest- edge technology is Taiwan Semiconduc-
tance, currently 7nm, although this ment to supply a growing market and tor Manufacturing Company (TSMC),
designation long ago lost any clear to push leading-edge research can only whose 7nm process has been in produc-
relationship to the transistor’s gate be supported by a company that has a tion since early 2018. Besides TSMC,
length or other features. In the 1990s, large revenue,” probably $30 billion a Samsung, which has an important
Moore’s Law became formalized in year or more, said Paolo Gargini. “It’s foundry business in addition to manu-
the National (after 1998, Internation- just a game for the big boys,” said Gar- facturing its own chips, announced
al) Technology Roadmap for Semicon- gini, formerly at Intel, who has headed in fall 2018 that it was ready for risk
IMAGE BY MACRO P HOTO

ductors, which spelled out what man- the formal roadmap through its recent production of 7nm. Intel, whose cur-
ufacturers, equipment suppliers, and rebirth as the International Roadmap rent 10nm process is often regarded as
academic researchers would need to for Devices and Systems (IRDS). similar to TSMC’s 7nm process, devotes
do to keep the industry on track. Nonetheless, when GlobalFoundries most of its attention to its own chips.

16 COMM UNICATIO NS O F THE ACM | J U NE 201 9 | VO L . 62 | NO. 6


news

Foundries at the Forefront


TSMC pioneered in 1987 the concept
Consolidation
ACM
of a pure-play foundry. Before that, “If
you had a new idea, you really didn’t
have a place where you could test it”
is “troubling” for Member
without paying for a dedicated factory, U.S. semiconductor News
Gargini said. The advent of foundry ca- manufacturing,
pacity was “the best thing that could
have happened for the industry,” he because “a vast SEEKING NEW WAYS TO
BUILD AND MAINTAIN OPEN
said. “The iPhone would never have ex- amount of the world’s DISTRIBUTED SYSTEMS
Gul Agha is a
isted if we didn’t have this model.”
At first, TSMC replicated older, less- advanced foundry professor in the
Department of
profitable technologies and grew by capacity is Computer
Science, and
“taking the rejects from the leading
semiconductor companies.” Gargini in TSMC’s hands.” director of the
Open Systems
said. However, “by 2000 or so they were Laboratory, at the University of
within shooting range of the leading Illinois at Urbana-Champaign.
companies.” Agha received his undergrad-
uate degree from the California
Later foundries have mostly con- raphy that prints the circuits, using Institute of Technology. He
fined themselves to following the lead- progressively shorter ultraviolet wave- earned a master’s degree in psy-
ers, but GlobalFoundries seemed to lengths to create tinier features. This chology and a Ph.D. in computer
and communication science
have higher aspirations. The company shrinkage stalled for years at a wave- from the University of Michigan
was created in 2009 from the manu- length of 193 nm because the next huge at Ann Arbor, but did his disserta-
facturing operations of Intel’s arch- jump, to extreme ultraviolet (EUV) at tion research at the Massachu-
competitor Advanced Micro Devices 13.5nm, requires different sources, op- setts Institute of Technology.
In 1989, he joined the faculty
(AMD). The company also acquired tics, and exposure techniques. Instead, at the University of Illinois,
Singapore-based foundry Chartered designers have exploited liquid immer- where he has remained ever
Semiconductor, and in 2015 added the sion, multiple exposures, and other since. “One of the joys of being
manufacturing operations of IBM. tricks to extend 193nm lithography. an academic,” Agha says, “has
been my ability for life-long
Leading-edge semiconductor manu- With the 7nm generation, EUV is final- learning and to acquire new
facturing is expensive and challenging, ly being used for some processing lev- knowledge and perspectives.”
which is one reason AMD and IBM di- els, but economically viable through- Agha’s research interest is
in understanding the nature of
vested that part of their businesses. Into put and yield won’t come easily. concurrent computation, leading
the 1990s, keeping up with Moore’s to new ways to build and maintain
Law could mostly be achieved by “scal- Shakeout open distributed systems.
ing,” following rules laid out by IBM’s These challenges are not new, but the “My research has
spanned diverse areas such
Robert Dennard in 1974 to make better withdrawal of companies from the lead- as programming languages,
transistors by shrinking lateral dimen- ing edge raises a “very valid question,” software engineering, cyber-
sions, shrinking layer thicknesses, and Shih said. “If there’s less competition, physical systems, and formal
methods,” Agha says. “I want to
increasing doping densities. Packing are we going to push the frontier less?” develop unifying programming
more transistors on the surface area of So far, there are still multiple suppliers. abstractions for new generation
a wafer also offered benefits such as re- “As long as you have two, it’s suf- of applications, such as IoT (the
duced cost per transistor, higher speed, ficient; if you have three it’s great,” Internet of Things) for Smart
Cities.” These applications
and lower power dissipation. Gargini said. “Samsung can do a require concurrency and
Continued exponential shrinkage lot of the stuff that TSMC can do,” coordination, notions of
brought transistors into collision with and TSMC’s lead already meant that approximation and stochastic
fundamental physical limits, though, “there’s nothing that is so special that behavior, and integration with
continuous spatiotemporal
such as gate oxides just a few atom-lay- GlobalFoundries was doing,” Gargini variables.
ers thick, as well as large leakage cur- said. AMD, for example, already made A real-world project on
rents and other non-idealities in the many of its most advanced central which Agha collaborated
was the implementation of
tiny devices. To sidestep these limits, processing units (CPUs) and graphics the world’s largest sensor
in the early 2000s manufacturers intro- processing units (GPUs) at TSMC. network to continuously
duced multiple revolutionary innova- Still, Shih notes that the consolida- monitor the structural health
tions, such as high-dielectric-constant tion is “troubling” for U.S. semicon- of South Korea’s 484-meter
Jindo Bridge, connecting the
(high-k) gate dielectrics, metal gates, ductor manufacturing, because “a vast Korean mainland to Jindo
strained silicon, and the nonplanar amount of the world’s advanced foundry Island. The sensor network
transistors known as FinFETs. capacity is in TSMC’s hands in three fabs promises a robust, significantly
lower-cost alternative to
More innovation will be needed, in Taiwan.” He added that “People who traditional structural inspection
including in process technology. Espe- worry about the defense-industrial base techniques, Agha says.
cially challenging has been the lithog- are very concerned about this issue.” —John Delaney

JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 17
news

To be sure, GlobalFoundries and the leading edge, but they’re catching GPUs and other high-performance prod-
others (including TSMC) can still build up on some of the trailing-edge tech- ucts. “We still can squeeze another two
very powerful products using older nologies,” Shih said. “The thing that is or three generations out of 2D,” Gargini
technologies. Moreover, Shih notes, driving TSMC is less competition from said, but he sees full 3D as inevitable and
“Some people say that, once we went GlobalFoundries; it’s competition from adding another 15 years of performance
below 14nm, or perhaps even higher Made in China 2025 [a Chinese pro- growth. “3D is not really as much of a
like 22nm, the unit cost per transis- gram to improve domestic manufactur- revolution” or as risky as the process in-
tor stopped decreasing and started in- ing competitiveness].” novations the industry has already im-
creasing again.” As a result, “more and plemented, he said. “The big guys can
more users say ‘That [leading-edge] A Bright Future? do it anytime they decide to do it.”
process is so expensive, I actually don’t In the end, though, no amount of in- The semiconductor industry faces
need it,’” he said, unless they are “mak- novation can extend exponential scal- challenges that we may look back on
ing things for cellphones or FPGAs or ing forever. Logic designers “are wait- as the end of Moore’s Law. Nonethe-
the bleeding-edge stuff like Intel micro- ing for EUV to save the game,” Gargini less, there are continued opportunities
processors, where you really need the said, but even if advanced lithography for better products, and so far there
ultimate in performance and power.” buys a few years, “that solution comes are still foundry companies ready and
Indeed, a manufacturer that spe- to an end.” In perhaps 2020 or 2021, able to enable new designs. “There is a
cializes in digital logic may not need a he conjectured, “Samsung, TSMC, or bright future,” Gargini insists. “I think
broad range of processes. In contrast, Intel, one of them will make a big an- it’s a very good balance.”
foundries support a whole range of nouncement that their next product is
devices, such as image sensors, and 3D [three-dimensional],” which would
Further Reading
devices for analog, radio-frequency, offer more transistors through verti-
and ultra-low-power circuits. Reliably cal stacking. Memory manufacturers International Roadmap for Devices and
Systems 2017 Edition, IEEE,
implementing such mix-and-match (including Samsung) have already be-
https://irds.ieee.org/roadmap-2017
processes in a design environment gun to introduce 3D structures, both by
that lets multiple customers use them stacking processed layers and growing Shih, W. C., Chien, C.F., Shih, C., and Chang, J.
The TSMC Way: Meeting Customer Needs at
is often more important to designers multiple layers of devices (see “Elec- Taiwan Semiconductor Manufacturing Co.,
than having the latest-generation tech- tronics are Leaving the Plane,” Com- Harvard Business School Case Collection
nology. For example, although TSMC munications, August 2018). Memory has 610-003, August 2009, https://www.hbs.
boasts dozens of high-end customers special advantages for 3D structures, edu/faculty/Pages/item.aspx?num=37868
for its 7nm process, for example, it con- such as uniform and redundant layouts, Monroe, D.
tinues to support older-generation pro- and low power (because most transis- Electronics are Leaving the Plane,
cesses, even the 180nm technology it tors are idle). Communications, August 2018,
https://cacm.acm.org/
introduced 20 years ago, which is good In contrast, in logic applications, magazines/2018/8/229776-electronics-are-
enough for many customers. many more transistors are active, and leaving-the-plane/fulltext
If leading-edge development slows removing the heat they produce is enor-
down, though, it might give other mously challenging even in the easier- Don Monroe is a science and technology writer based in
companies, including those in main- to-cool planar layout. So far, logic com- Boston, MA, USA.

land China, more chance to compete. panies are testing the 3D waters with
“The Chinese are having trouble at advanced packaging techniques for © 2019 ACM 0001-0782/19/6 $15.00

Milestones

ACM, CSTA Announce Cutler-Bell Prize Winners


ACM and the Computer Science NAVEEN DURVASULA, ESHIKA SAXENA, Said ACM president Cherri M.
Teachers Association (CSTA) will SILVER SPRING, MD BELLEVUE, WA Pancake. “These are the kinds of
bestow the 2018–2019 Cutler-Bell Durvasula developed a method to Saxena developed the skills students will increasingly
Prize promoting computer science predict, for a given patient-donor “HemaCam,” a clip-on attachment need in our digital age. In short,
and empowering students to pair, the expected quality and that turns a smartphone camera the Cutler-Bell Prize encourages
pursue computing challenges waiting time of the transplant into a microscope capable of students to see the possibilities, as
beyond the classroom upon four they would receive through capturing blood cell images for well as the excitement, that
graduating high school students. kidney exchange. disease screening. computing offers.”
Each will receive a $10,000 Added CSTA executive director
cash prize toward tuition at the ISHA PURI, VARUN SHENOY, Jake Baskin, “Our winners have
institution they will attend next year. CHAPPAQUA, NY CUPERTINO, CA created projects that have
The winning projects illustrate Puri focused on development of Shenoy created an effective applicable real-world solutions, all
the diverse applications being a system to detect the direction method to diagnose the onset resulting from the high-quality
developed by the next generation and frequency of gaze fixation to of wound complications during computer science education they
of computer scientists: test for and diagnose dyslexia. surgical operations. have received.”

18 COMM UNICATIO NS O F THE ACM | J U NE 201 9 | VO L . 62 | NO. 6


news

Society | DOI:10.1145/3323702 Keith Kirkpatrick

Ethics in
Technology Jobs
Employees are increasingly challenging
technology companies on their ethical choices.

O
RGANIZED PROTESTS AGAINST noting that the company had taken
companies are hardly a “an increasingly hard line” on inap-
new phenomenon, as peo- “Silicon Valley propriate conduct at work and had
ple have boycotted or pro- companies lead the fired 48 people, including 13 senior
tested both corporate poli- managers, in the previous two years,
cies and actions for years. For example, way in ... science and without giving any of them exit pack-
a global protest of international agro- technology, but when ages. Just prior to a November 1 pro-
chemical and agricultural biotechnol- test by employees known as “The
ogy corporation Monsanto in 2013 saw it comes to issues Walkout for Real Change,” Pichai
coordinated marches across 52 coun- of privacy, creating sent out a follow-up note apologiz-
tries and 436 cities. In 2010, thousands ing “for the past actions and the pain
of people in the U.S. protested against inclusive workplaces, they have caused employees” and in-
oil giant BP for its role in the Deepwater and ethics, they seem dicating that employees would be sup-
Horizon oil spill. And in the late 1990s, ported if they protested.
U.S. gun owners protested against gun to be devolving.” Despite the apology, thousands of
manufacturers Colt Manufacturing Google employees around the world
Company and Smith & Wesson for their walked out on November 1, and orga-
perceived cooperation with then-Presi- nizers issued a statement demand-
dent Bill Clinton’s gun control efforts. ing more transparency from Google
Yet many of the corporate protests that detailed a culture of sexual harass- around its handling of sexual harass-
that have occurred against technol- ment at the ride-sharing giant, which ment, an end to pay and opportunity
ogy companies over the past year were ultimately led to changes at the company inequality, and more employee em-
marked by a distinct difference: they and the dismissal of its former CEO, Tra- powerment overall. In addition, the
were often organized by, led, or coordi- vis Kalanick. “Fowler’s actions showed group requested that an employee
nated with workers at the very compa- that even individual tech workers, by representative be appointed to the
nies being protested. The impetus for speaking up, can actually have a large ef- company’s board and that Google end
these walkouts appears to be largely fect on the organization that they’re in or “forced arbitration” in cases of harass-
two issues: the presence of a culture of were formerly in,” Sahami says. ment and discrimination, a practice
inequality at technology companies, It is not just a culture of misogyny that prevents employees from taking
and the use of technology for what that is irritating workers and spurring cases to court.
workers consider to be unethical or them into action; a lack of transparency “Silicon Valley companies lead the
harmful activities. is also a key catalyst for workers to band way in the fields of science of and tech-
Although there is precedent for tech together to make their feelings known. nology, but when it comes to issues of
workers protesting against their em- One example was Google’s handling privacy, creating inclusive workplaces,
ployers, such as when defense workers of a $90-million exit payment to Andy and ethics, they seem to be devolving,”
in the 1980s pushed back against their Rubin, a key executive of the company says Congresswoman Jackie Speier,
employers’ participation in the develop- and the creator of the Android mobile who represents San Francisco and
ment of the Strategic Defense Initiative, operating system. Upon Rubin’s depar- parts of Silicon Valley, and publicly
colloquially known as Star Wars, the dif- ture from the company in 2014, Google supported the walkouts.
ference is that tech workers feel more failed to disclose it had received a com- Lack of diversity is a problem in the
empowered to speak out today. plaint that Rubin had committed an act tech industry. For example, nearly 70%
“[Workers] actually see that their of sexual misconduct against another of Google employees are men and 53%
words and action can have a real impact employee, and that an investigation are non-Hispanic whites, according
on a broader scale,” says Mehran Sa- had confirmed its veracity. In October to the Google Diversity Annual Report
hami, a professor of computer science 2018, a report in The New York Times 2018. Among leadership roles, the
at Stanford University. Sahami points to made these details public. numbers within Google are even less
the success former Uber employee Su- Upon that disclosure, Google CEO diverse, as 67% are white non-Hispanic
san Fowler had with blog posts she wrote Sundar Pichai sent a memo to staff and 75% are men.

JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 19
news

“On the issue of diversity, I continue While CEO Marc Benioff condemned the company builds, the customers to
to hear from women and other workers the agency’s separation of families at which the companies sell, or how the
in the tech industry who are harassed, the border, he refused to cancel the companies treat their own employees.
bullied, assaulted, and ignored be- contract, and the company still sup- “Questions have always been raised
cause they weren’t frat buddies with the plies software to the agency, despite about what companies do and why they
CEO or turned down sexual overtures,” continuing pressure from workers. do it,” Hafrey says. “We’re just seeing
Speier says. “It’s a cultural crisis, and Ultimately, workers may be able to it in a way that I think maybe we were
as I’ve made clear to the tech compa- make their voices heard, but manage- not previously considering because
nies in and around my district, the ment at many large companies are we were enamored of the bright future
industry will never reach its full po- likely to be more focused on how their that our recent technologies promised
tential until this crisis is addressed.” decisions impact the company’s bot- us, and we are now realizing the down-
Google is hardly the only company tom line, and so may not always bow to side or potential downsides of some of
being subjected to protests from its the wishes of employees. those technologies.”
own employees; others also have pro- Ceren Cubukcu, an employment Sahami adds that there may be a
tested how technology being devel- consultant and author of Make Your generational reason for the increasing
oped by the companies they work for American Dream A Reality: How to Find level of activism in the technology field.
is being used by government entities. a Job as an International Student in the “There’s lots of data that shows, for ex-
Representatives from Amazon, Sales- U.S., says employees may simply decide ample, that many in the younger gen-
force, and Microsoft signed petitions to work for another company if they eration look for work that they believe
and held demonstrations objecting to have a problem with a technology com- that has value and that’s more impor-
how their work is being used for sur- pany’s actions, rather than protesting tant to them than just the paycheck;
veillance, or to separate families at the to get their employer to change course. it’s believing that they’re having some
U.S. border. According to Leigh Hafrey, “In some projects, especially for sort of social impact,” Sahami says.
a Senior Lecturer at the Massachusetts IT/high tech projects, you don’t even “There’s been a lot of bad behavior,
Institute of Technology Sloan School of know what the whole project will be and not just in the tech industry, but
Management and author of the book at the end because you work in teams, more broadly around issues of sexual
The Story of Success: Five Steps to Mas- and only the top management knows harassment that has been in some
tering Ethics in Business, these protest about the whole project,” Cubukcu sense tolerated for a long time. And it
actions are occurring because workers says. “If you don’t feel comfortable in shouldn’t have been tolerated, but over
are more aware of questions of social your job or don’t like your work, you time, culture changes and people are
justice and what constitutes appropri- can always try to switch to another job, willing to speak up more about that be-
ate and inappropriate behavior. and the company can always replace ing unacceptable and so, generational-
“We’ve had a lot of social move- you with some other employee.” ly, we begin to call out more and more
ment over the past several decades that That said, the bargaining posi- of these bad behaviors that’s been hap-
raised awareness and made people tion for many tech workers is perhaps pening and try to rectify it.”
conscious of what can potentially hap- stronger than it ever has been in his-
pen within organizations,” Hafrey says. tory, given that programmers, software
Further Reading
Indeed, thousands of workers at engineers, and data scientists that are
Amazon, Google, Microsoft, and Sales- talented, hardworking, and reliable are Fowler, S.
Reflecting on one very, very strange
force have signed petitions asking relatively hard to find and keep.
year at Uber, Feb. 19, 2017, https://
their respective management teams “Finding good technical people is www.susanjfowler.com/blog/2017/2/19/
to cancel or withdraw from contracts difficult,” Sahami says, “so companies reflecting-on-one-very-strange-year-at-uber
with U.S. government agencies, in- pay more attention to their workers be- Keller, M., and Larsen, K.
cluding Immigration and Customs cause they realize that these are highly ‘Enough is enough’: Google workers
Enforcement, Customs and Border skilled people who are difficult to find. in San Francisco, Mountain View,
Protection, and the Department of De- If those tech workers leave, it’s going to Sunnyvale walk out in protest of treatment
of women, November 1, 2018, ABC 7
fense. The public nature of these pro- have a serious impact on the productiv-
News San Francisco,
tests and petitions may be having an ity of the company.” https://abc7news.com/business/enough-is-
effect; in June 2019, Google employees Even young people who have yet to enough-bay-area-google-workers-walk-out-
succeeded in getting the company to establish themselves in their careers in-protest/4596806/
agree not to renew its deal to help the are trying to flex their muscles, shun- Brown D.
Pentagon build artificial intelligence ning companies they don’t agree with “Google Diversity Annual Report
tools for drone warfare. during the interview and hiring pro- 2018.” Diversity.Google. https://static.
googleusercontent.com/media/diversity.
Other protests have been less than cess. A Buzzfeed article published in Au-
google/en//static/pdf/Google_Diversity_
successful. Salesforce.com employees gust 2018 included several accounts of annual_report_2018.pdf
gathered twice in 2018 in front of the tech workers that declined lucrative po-
company’s headquarters in San Fran- sitions at major technology companies Keith Kirkpatrick is principal of 4K Research &
cisco to protest the firm’s multimil- because they disagreed with the com- Consulting, LLC, based in Lynbrook, NY, USA.
lion-dollar contract with the U.S. Cus- pany’s practices or ethical positions, re-
toms and Border Protection agency. lating to either the products or services © 2019 ACM 0001-0782/19/6 $15.00

20 COM MUNICATIO NS O F TH E ACM | J U NE 201 9 | VO L . 62 | NO. 6


SHAPE THE FUTURE OF COMPUTING.
JOIN ACM TODAY. www.acm.org/join/CAPP

SELECT ONE MEMBERSHIP OPTION


ACM PROFESSIONAL MEMBERSHIP: ACM STUDENT MEMBERSHIP:
q Professional Membership: $99 USD q Student Membership: $19 USD
q Professional Membership plus q Student Membership plus ACM Digital Library: $42 USD
ACM Digital Library: $198 USD q Student Membership plus Print CACM Magazine: $42 USD
($99 dues + $99 DL) q Student Membership with ACM Digital Library plus
Print CACM Magazine: $62 USD

q Join ACM-W: ACM-W supports, celebrates, and advocates internationally for the full engagement of women
in computing. Membership in ACM-W is open to all ACM members and is free of charge.

PAYMENT INFORMATION

Name
Purposes of ACM
ACM is dedicated to:
Mailing Address 1) Advancing the art, science, engineering, and application
of information technology
2) Fostering the open interchange of information to serve
both professionals and the public
City/State/Province
3) Promoting the highest professional and ethics standards
ZIP/Postal Code/Country
By joining ACM, I agree to abide by ACM’s Code of Ethics
q Please do not release my postal address to third parties (www.acm.org/code-of-ethics) and ACM’s Policy Against
Harassment (www.acm.org/about-acm/policy-against-
harassment).
Email Address
I acknowledge ACM’s Policy Against Harassment and agree
q Yes, please send me ACM Announcements via email that behavior such as the following will constitute
q No, please do not send me ACM Announcements via email grounds for actions against me:

q AMEX q VISA/MasterCard q Check/money order • Abusive action directed at an individual, such as


threats, intimidation, or bullying
• Racism, homophobia, or other behavior that
Credit Card #
discriminates against a group or class of people
Exp. Date • Sexual harassment of any kind, such as unwelcome
sexual advances or words/actions of a sexual nature
Signature

BE CREATIVE. STAY CONNECTED. KEEP INVENTING.


ACM General Post Office 1-800-342-6626 (US & Canada) Fax: 212-944-1318
P.O. Box 30777 1-212-626-0500 (Global) acmhelp@acm.org
New York, NY 10087-0777 Hours: 8:30AM - 4:30PM (US EST) acm.org/join/CAPP
V
viewpoints

DOI:10.1145/3325279 Linnet Taylor


• Michael L. Best, Column Editor

Global Computing
Global Data Justice
A new research challenge for computer science.

W
HEN THE WORLD’S larg- databases and analytics that allow tion, and can help evaluate progress
est biometric popula- previously invisible populations to toward achieving the Sustainable De-
tion database—India’s be seen and represented by authori- velopment Goals. If data technologies
Aadhaar system—was ties, and which make poverty and are used in a good cause, they confer
challenged by activ- disadvantage harder to ignore, are a unprecedented power to make the
ists the country’s supreme court powerful tool for the marginalized world a fairer place.
issued a historic judgment. It is not and vulnerable to claim their rights That ‘if’, though, deserves some at-
acceptable, the court said, to allow and entitlements, and to demand fair tention. The new data sources’ value to
commercial firms to request details representation.2 This is the claim the the United Nations, to humanitarian
from population records gathered United Nations is making5 in rela- actors, and to development and rights
by government from citizens for pur- tion to new sources of data such as organizations are only matched by
poses of providing representation and cellphone location records and social their market value. If it is possible to
care. The court’s logic was important media content: if the right authorities monitor who is poor and vulnerable, it
because this database had, for a long can use them in the right way, they is also possible to manipulate and sur-
time, been becoming a point of con- can shine a light on need and depriva- veil. Surveillance scholar David Lyon3
tact between firms that wanted to con- has said that all surveillance operates
duct ID and credit checks, and govern- along a spectrum between care and
ment records of who was poor, who How to set control: a database like Aadhaar can be
was vulnerable, and who was on which used to channel welfare to the needy,
type of welfare program. The court boundaries but it could also be used to target con-
also, however, said that this problem for powerful sumers for marketing, voters for politi-
of public-private function creep was cal campaigns, transgender people or
not sufficiently bad to outweigh the international HIV sufferers for exclusion—the list is
potential good a national population actors is a question endless. The possibilities for monetiz-
database could do for the poor. Many ing the data of millions of poor and
people, they said, were being cheated yet to be solved vulnerable people are endless, and
out of welfare entitlements because in any field. may be irresistible if hard boundaries
they had no official registration, and are not set. But how to set boundaries
this was more unfair than the moneti- for powerful international actors is a
zation of their official records. question yet to be solved in any field.
This judgment epitomizes the Data technologies have very dif-
problem of global data justice. The ferent effects in different social, eco-

22 COMM UNICATIO NS O F THE AC M | J U NE 201 9 | VO L . 62 | NO. 6


viewpoints

A woman has her eyes scanned while others wait during the Aadhaar registration process in India circa October 2018. Aadhaar produces
identification numbers to individuals issued by the Unique Identification Authority of India on behalf of the Government of India for the
purpose of establishing the identity of every single person.

nomic, and political environments. derstood differently in different plac- ones, and even if they work from simi-
WhatsApp, for example, allows par- es. Nigeria, the U.S., and India, for lar templates, will apply them differ-
ents’ groups to message each other example, will each have a different ently. Democracies will set boundar-
about carpooling. It also facilitates idea of what is ‘good’ or ‘necessary’ to ies for data collection and use that
ethnic violence in India and Myan- do with data technologies, and how are different from those of authoritar-
mara and facilitates extremist poli- to regulate their development and ian states—yet we all have to work to-
ticsb in Brazil. Technology almost al- use. Our research asks how to recon- gether on this problem. Like climate
ways has unintended consequences, cile those different viewpoints, given change, any unregulated data market
and given the global reach of apps that each of those international ac- affects us all.
and services, the consequences of our tors—plus myriad others—will have So neither harmonized data pro-
global data economy are becoming the power to develop and sell data tection nor ethical principles are the
less and less predictable.1 technologies that will affect people answer—or at least not on their own.
Global data justice researchers are all around the world. Ethics, at the moment at least, is too
aiming to frame new governance so- Currently much of the internation- frequently just a cover for self-regu-
PHOTO BY DAVID TA LUK DA R/NURP HOTO VIA GET T Y IMAGES

lutions that can help with this glob- al discussion revolves around har- lation.6 We need to ask global ques-
al level of unpredictability. In this monizing data protection amongst tions about global problems, but we
emerging research field, we are ex- countries, and getting technology de- are often stuck looking at our own
ploring how the tools we have are glo- velopers to agree on ethical principles environment and our own set of tools,
balizing: regulation, research ethics, and guidelines. Neither of these are without understanding what kind of
professional standards and guide- bad ideas, but each can go in a radi- toolkit can address the international-
lines are all having to be translated cally different direction depending level consequences of our growing
into new environments, and get un- on local views on what is good and data economy.
desirable. Strongly neoliberal, pro- If we ask this global question, in-
a See https://bit.ly/2zWDIKO market countries will develop differ- stead: How to draw on approaches
b See https://nyti.ms/2EzEP5h ent principles from more socialist that are working in different places,

JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 23
viewpoints

and how to set boundaries and goals to practical questions of governance:


collectively for our global data econ- we wish to conceptualize how data
omy?, we arrive at questions about What is fair should be governed to promote free-
both justice, and intercultural under- or innocuous in dom and equality. This is not some-
standings of it. We need not only to be thing academia can do on its own, but
able to articulate principles of justice one place may be is a long-term challenge to be ad-
and fairness, but to have a productive unfair or harmful dressed in collaboration with policy-
discussion about them with nations makers, and in consultation with ev-
that see things very differently. in another. eryone affected by the data economy.
Research on global data justice4 is Computer scientists are already
starting from this larger question of part of this process. When they con-
how to pick and articulate principles ceptualize and build systems, they
that people seem to agree on around make choices that determine how
the world; we will then work on how data gets constructed and used. Un-
those should be turned into tools for These principles form a starting derstanding how computer scientific
governing data—and creating the point for understanding how similar research connects to the human and
institutions we need to do so, if they challenges play out in different plac- to the social world, and how CS re-
do not exist. Researchers working on es. The task of research is to identify search contributes to particular out-
this problem (who now include phi- where common responses to those comes, is the first step. Making con-
losophers, social scientists, lawyers, challenges are emerging, to draw out nections between that understanding
computer scientists and informatics lessons for governance, and to sug- and social scientific research is a
scholars, doing research in Europe, gest ways to operationalize them. necessary first step. This process is
the U.S., Africa, and Asia) have to try Translating this vision to the glob- taking place at some computer scien-
to capture at least three conflicting al level is a huge challenge. To do tific conferences (notably ACM FAT*,
ideas about what data technologies this, we have to place different vi- which is now integrating social sci-
do and what their value is. sions of data’s value and risks in re- ence and law tracks), but is also vis-
These conflicting ideas offer three lation to each other, and seek com- ible in smaller workshops and inter-
main principles: first, that our vis- mon principles that can inform disciplinary programs where social
ibility through data should work for governance. Framing what global scientists and computer scientists
us, not against us. We should be vis- data justice might mean involves come together to work on the social
ible through our data when we need law, human rights, the anthropology implications of data science and AI,
to be, in ways that are necessary for of data use and sharing, the political to publish together and to build a re-
our well-being, but that it should be economy of the data market and of search agenda. This work will grow in
part of a reasonable social contract data governance more broadly, and scale and importance in the coming
where we are aware of our visibility international relations. years, with the notion of global data
and can withdraw it to avoid exploi- This global problem is also becom- justice as a benchmark for the inclu-
tation. Second, that we should have ing part of the agenda of computer siveness and breadth of the debate.
full autonomy with regard to our use science and engineering. The agenda
References
of technology. We should be able to of justice in relation to digitization 1. Dencik, L., Hintz, A., and Cable, J. Towards data
adopt technology that is beneficial is under formation, and needs input justice? The ambiguity of anti-surveillance
resistance in political activism. Big Data & Society 3,
for us, but using a smartphone or be- from all the fields doing conceptual 2 (Feb. 2016), 1–12; https://bit.ly/2VxoF0A
ing connected should not be linked and applied work in relation to the 2. Heeks, R. and Renken, J. Data Justice For
Development: What Would It Mean? (Development
to our ability to exercise our citizen- digital. It is not a task any individual Informatics Working Paper Series No. 63).
ship. Someone who has to use social field can address on its own, because Manchester, U.K., 2016; https://bit.ly/2UKVIRr
3. Lyon, D. Surveillance Studies: An Overview. Polity
media to get a national identity docu- work on data technology has evolved Press, Cambridge, 2007.
ment or who has to provide biomet- beyond the point where those who 4. Taylor, L. What Is Data Justice? The Case for
Connecting Digital Rights and Freedoms on the
rics through a private company in conceptualize and develop systems Global Level. Big Data and Society, 2017;
order to register for asylum, is not can understand what effects they will https://bit.ly/2uZjxXb
5. United Nations. A World that Counts: Mobilising the
using data technologies so much as have on the global level. What is fair Data Revolution for Sustainable Development. New
being used by them. Lastly, the duty or innocuous in one place may be un- York, 2014; https://bit.ly/1it3l8P
6. Wagner, B. Ethics as an escape from regulation:
of preventing data-related discrimi- fair or harmful in another. From ethics-washing to ethics-shopping? In M.
Hildebrandt, Ed. Being Profiled: Cogitas Ergo. Sum
nation should be held by both indi- Data justice should provide a lens Amsterdam University Press, Amsterdam, 2018,
viduals and governments. It is not through which we can address ques- 84–90.
enough to demand transparency so tions about how to integrate values
that people can protect themselves into technology, but it is a higher-lev- Linnet Taylor (l.e.m.taylor@tilburguniversity.edu) is
an associate professor at Tilburg Law School, Tilburg
from the negative effects of profiling: el question that cannot be answered University, The Netherlands.
people should be proactively protect- with guidelines or with toolkits for
This work is funded by Horizon 2020 ERC Starting Grant
ed from discrimination by authorities privacy or explainability (despite the #757247 DATAJUSTICE.
who have the power to control and importance of these approaches). It is
regulate the use of data. a conceptual question, though it leads Copyright held by author.

24 COMM UNICATIO NS O F THE ACM | J U NE 201 9 | VO L . 62 | NO. 6


V
viewpoints

DOI:10.1145/3325284 A.T. Markettos, R.N.M. Watson, S.W. Moore, P. Sewell, and P.G. Neumann

Inside Risks
Through Computer
Architecture, Darkly
Total-system hardware and microarchitectural
issues are becoming increasingly critical.

S
PECTRE, 11 MELTDOWN, 13 FORE-
SHADOW, 18,20 Rowhammer,9
Spoiler, —suddenly it seems
9

as if there is a new and un-


ending stream of vulner-
abilities in processors. Previous niche
concepts such as speculative execution
and cache timing side-channels have
taken center stage. Across the whole
hardware/software system, new vulner-
abilities such as insufficiently protect-
ed memory access from untrustworthy
PCIe or Thunderbolt USB-C periph-
erals,15 malicious Wi-Fi firmware,4 or
alleged hardware implants14 are also
starting to emerge.
We may be facing a crisis in systems
design. What might we do about it?
Here, we consider whether existing ap-
proaches are adequate, and where sub-
stantial new work is needed.

Prove, Don’t Patch


Many existing commercial operating
systems have extensive vulnerabili-
ties. The MITRE repository of com-
mon software security vulnerabilities
COLL AGE BY A NDRIJ BO RYS ASSOCIATES, U SING SH UT T ERSTOCK

(CVEs: http://cve.mitre.org) currently


has over 110,000 open enumerated lars, which clearly tilts the balance of large classes of attacks. It relies on
vulnerabilities that have been report- firmly in favor of the attacker. trustworthy models of the architec-
ed (excluding ones that have been re- Recent advances such as the seL4 tural abstraction—the hardware/soft-
solved, and totally ignoring countless microkernel,10 the CertiKOS virtual- ware interface—and those too have
other vulnerabilities that have never machine hierarchy,8 and the Comp- advanced recently, in work by the au-
been reported); the list is growing at Cert verified compiler12 have signifi- thors and others.1,6
a rate of approximately 50 new vulner- cantly contributed to the state of the
abilities each day. Patches cannot pos- art in formally proven correctness of Looking Behind
sibly keep up with the weaknesses. In operating-system kernels. This tech- the Hardware Curtain
addition, patching silicon takes years nology is not yet widespread, but it of- It has recently become clear that this
and potentially costs billions of dol- fers the potential to prove the absence is not enough, in several ways. First,

JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 25
viewpoints

processor hardware (typically subject there by the designer but were created
to extensive verification) has long been by the physical implementation, often
assumed to provide a solid foundation Designers need unhelpfully sucking away signals or
for software, but increasingly suffers to understand more power. Today we have parasitic com-
from its own vulnerabilities. Second, puters. Many components have unin-
increasing complexity and the way sys- of what takes place tended computational power, which
tems are composed of many hardware/ in layers above can be perverted—from the x86 page-
software pieces, from many vendors, fault handler2 to DMA controllers.16
means one cannot think just in terms or below their field This presents a challenge to under-
of a single-processor architecture. We of expertise. standing where all the computation is
need to take a holistic view that ac- happening, such as what is software
knowledges the complexities of this rather than hardware.
landscape. Third, and most seriously,
these new attacks involved phenomena Toward Robustly Engineered
that cut across the traditional architec- Trustworthy Systems
tural abstractions, which have inten- exploitable malfunction. Unlike the bi- Total-system approaches to security
tionally only described the envelopes nary code of malware, there is no way to defenses are important (see, for ex-
of allowed functional behavior of hard- observe many of these physical proper- ample, Bellovin3). A further lesson
ware implementations, to allow imple- ties. As a result, systems are more vul- from physical-layer attacks is why
mentation variation in performance. nerable to both design mistakes and such attacks are not more of a threat
That flexibility has been essential to supply-chain attacks. today—due to further layers of pro-
hardware performance increases—but As the recent attacks demonstrate, tection. It is not enough to extract
the attacks involve subtle information side-channels are becoming more the cryptographic key from a banking
flows via performance properties. They powerful than expected. Traditional card using laser fault injection; the at-
expose the hidden consequences of physical-layer side-channels are a sig- tacker must also use it to steal money.
some of the microarchitectural inno- nals-from-noise problem. If you record At this point the bank’s system-level
vations that have given us ever-faster enough traces of the power usage, with defenses apply, such as transaction
sequential computation in the last de- powerful enough signal processing, limits and fraud detection. If the key
cades, as caching and prediction leads you can extract secrets. Architectural relates only to one account, the payoff
to side-channels. side-channels have more bandwidth involves only money held by that cus-
and better signal-to-noise ratios, leak- tomer, not all other customers. Ap-
Hardware Vulnerabilities ing much more data more reliably. plication-level compartmentalization
Ideally, security must be built from the If we take a systems-oriented view, limits the reward, and thus makes the
ground up. How can we solve the prob- what can we say about the problem? attack economically nonviable.
lem by building the foundations of se- First of all, the whole is often worse Another approach is to ensure that
cure hardware? than the sum of its parts. Systems are richer contextual information is avail-
For years, hardware security to many composed of disparate components, able that allows the hardware to under-
people has meant focusing on the often sourced from different vendors, stand and enforce security properties.
physical layers. Power/electromagnetic and often granting much greater access The authors are on a team designing,
side-channels and fault injection are to resources than needed to fulfill their developing, and formally analyzing
common techniques for extracting purpose; this can be a boon for attack- the CHERI hardware instruction-set
cryptographic secrets by manipulating ers. For example, in Google Project Ze- architecture,20 as well as CHERI oper-
the physical implementation of a chip. ro’s attack on the Broadcom Wi-Fi chip ating system and application security.
These are not without effectiveness, inside iPhones,4 the attackers jumped The CHERI ISA can enable hardware to
but it is notable that the new spate of from bad Wi-Fi packets to installing enforce pointer provenance, arbitrarily
attacks represents entirely different, malicious code on the Wi-Fi chip, and fine-grained access controls to virtual
and more potent, attack vectors. then to compromising iOS on the ap- memory and to abstract system ob-
One lesson from the physical-layer plication processor. Their ability to use jects, as well as both coarse- and fine-
security community is that implemen- the Wi-Fi chip as a springboard mul- grained compartmentalization. To-
tation is critical. Hardware definition tiplied their efficacy. It is surprisingly gether, these can provide enforceable
languages (HDLs) are compiled down difficult to reason about the behavior of separation and controlled sharing, al-
to connections between library logic such compositions of components.5 At- lowing trustworthy and untrustworthy
cells. The logic cells are then placed tackers may create new side-channels software (including unmodified legacy
and routed and the chip layer designs through unexpected connections—for code) to coexist securely. Since the
produced. One tiny slip—at any level example, a memory DIMM that can hardware has awareness of software
from architecture to HDL source and send network packets via a shared I2C constructs such as pointers and com-
compiler, to cell transistor definitions, bus with an Ethernet controller.17 partments, it can protect them, and we
routing, power, thermals, electromag- Hardware engineers often talk can reason about the protection guar-
netics, dopant concentrations and about ‘parasitic’ resistance or capaci- antees—for example, formally proving
crystal lattices—can cause a potentially tance—components that were not put the architectural abstraction enforces

26 COM MUNICATIO NS O F TH E AC M | J U NE 201 9 | VO L . 62 | NO. 6


viewpoints

specific security properties. We believe


this CHERI system architecture has
References
1. Armstrong, A. et al. ISA Semantics for ARMv8-A,
RISC-V, and CHERI-MIPS. In Proceedings of the
Calendar
significant potential to provide unprec-
edented total-system trustworthiness,
including addressing some of the side-
Principles of Programming Languages Conference
(POPL) 2019.
2. Bangert, J. et al. The page-fault weird machine:
Lessons in instruction-less computation. In
of Events
channel attacks that were unknown at Proceedings of the USENIX Workshop on Offensive June 2–6
Technologies (WOOT), 2013.
the time of its conception.19 3. Bellovin, S.M. and Neumann, P.G. The big picture: A
JCDL ‘19: The 18th ACM/IEEE
systems-oriented view of trustworthiness. Commun. Joint Conference on Digital
Such architectural guarantees enable
ACM 61, 11 (Nov. 2018), 24–26. Libraries Champaign, IL,
more secure implementation of currently 4. Beniamini, G. Over The Air: Exploiting Broadcom’s Wi-Fi Sponsored: ACM/SIG,
insecure languages (such as C/C++) and Stack; https://bit.ly/2oA6GJL Contact: J. Stephen Downie,
5. Gerber, S. et al. Not your parents’ physical address
can put demonstrably secure operat- space. In Proceedings of the Hot Topics in Operating
Email: jdownie@illinois.edu
Systems Conference (HotOS-XV) 2015.
ing-system kernels on a more secure 6. Goel, S., Hunt, W.A. Jr., and Kaufmann, M. Engineering June 3–5
foundation. Similar approaches may a formal, executable x86 ISA simulator for software SIGSIM-PADS ‘19:
verification. Provably Correct Systems (ProCoS), 2017.
apply in other domains, for example 7. Google Project Zero, 2018; https://bit.
SIGSIM Principles of Advanced
between vulnerable components ly/2CAQzTMGu, R. et al. CertiKOS: An Extensible Discrete Simulation,
Architecture for Building Certified Concurrent OS Chicago, IL,
across a system-on-chip. Kernels. OSDI 2016, 653–669; See also https://bit. Sponsored: ACM/SIG,
Engineering such systems re- ly/2Uzj9sI for ongoing work. Contact: Dong Jin,
8. Islam, S. et al. SPOILER: Speculative Load Hazards
quires a more holistic view, with a Boost Rowhammer and Cache Attacks, arXiv e-prints
Email: dong.jin@iit.edu
tighter interplay between hardware, (Mar. 1, 2019); https://bit.ly/2TxWdhk
9. Klein, G. et al. Comprehensive formal verification of June 3–5
operating systems and applications. an OS microkernel. ACM Trans. Computer Systems SYSTOR ‘19: International
In particular, designers need to un- 2014; See also https://bit.ly/2UPKgEY for ongoing Systems and Storage Conference,
work.
derstand more of what takes place 10. Kocher, P. et al. Spectre attacks: Exploiting Haifa, Israel,
in layers above or below their field of speculative execution. ArXiv e-prints (Jan. 2018); Sponsored: ACM/SIG,
https://bit.ly/2lUpJLk Contact: Moshik Hershcovitch,
expertise. Better architectural models 11. Leroy, X. A formally verified compiler back-end. Journal Email: moshikh@il.ibm.com
enable more robust verification of se- of Automated Reasoning 43, 4 (2009), 363–446.
12. Lipp, M. et al. Meltdown, 2018; https://bit.ly/2E6myYl
curity properties, and amortizing veri- 13. Markettos, A.T. Making sense of the Supermicro June 3–6
fication costs across projects helps motherboard attack; https://bit.ly/2PqOnld SACMAT ‘19: The 24th ACM
14. Markettos, A.T. et al. Thunderclap: Exploring Symposium on Access Control
defenders but not attackers. Such vulnerabilities in operating system IOMMU protection Models and Technologies,
via DMA from untrustworthy peripherals. In
verification must be inclusive, testing Proceedings of the Network and Distributed Systems
Toronto, ON,
all the aspects of a system including Security Symposium (NDSS), (Feb. 2019). Sponsored: ACM/SIG,
15. Rushanan, M. and Checkoway, S. Run-DMA. In Contact: Atefeh (Atty) Mashatan,
the boundaries of implementation- Proceedings of the WOOT 2015 Conference. (2015). Email: amashatan@ryerson.ca
defined behavior. 16. Sutherland, G. Secrets of the motherboard ([sh*t]
my chipset says). In Proceedings of the 44CON 2017,
Better verification can defend us (Sept. 2017). June 5–7
against new vulnerabilities present in 17. Van Bulck, J. et al. Foreshadow: Extracting the keys TVX ‘19: ACM International
to the Intel SGX kingdom with transient out-of-order Conference on Interactive
the abstractions it is based upon, but execution. USENIX Security (Aug. 15–17, 2018); Experiences for TV and
not against those that involve phenom- https://bit.ly/2DusEDT
Online Video,
18. Watson, R.N.M. et al. Capability Hardware Enhanced
ena that are not modeled. An open RISC Instructions (CHERI): Notes on the Meltdown Salford (Manchester), U.K.,
question is whether there is an abstrac- and Spectre Attacks. Technical Report UCAM- Sponsored: ACM/SIG,
CL-TR-916, University of Cambridge, Computer Contact: Jonathan Hook,
tion between an architectural specifi- Laboratory (Feb. 2018); https://bit.ly/2DuVDrr Email: jonathan.hook@york.
cation and a full hardware implemen- 19. Watson, R.N.M. et al. Capability Hardware Enhanced
RISC Instructions (CHERI): CHERI Instruction-set
ac.uk
tation that allows us to fully reason Architecture, Version 7, Technical Report UCAM-
about potential leakage, without being CL-TR-927, University of Cambridge, Computer June 9–12
Laboratory (Apr. 2019); https://bit.ly/2XzPgKU UMAP ‘19: 27th Conference on
so complex as to being intractable. 20. Weisse, O. et al. Foreshadow-NG: Breaking the virtual User Modeling, Adaptation
memory abstraction with transient out-of-order
execution (Aug. 2018); https://bit.ly/2VZLD0h and Personalization,
Conclusion Larnaca, Cyprus,
Co-Sponsored: ACM/SIG,
Traditional models—in which design- A. Theodore Markettos (theo.markettos@cl.cam.ac.uk)
Contact: George Angelos
ers have free reign within tightly con- is a Senior Research Associate in the Department of
Computer Science and Technology at the University of Papadopoulos,
strained layers—are no longer fit for Cambridge, U.K. Email: george@cs.ucy.ac.cy
purpose. Hardware/software system Robert N.M. Watson (robert.watson@cl.cam.ac.uk) is a
Senior Lecturer in the Department of Computer Science June 10–13
security architects need better aware- and Technology at the University of Cambridge, U.K. ICMR ‘19: International
ness of what comes above and below Simon W. Moore (simon.moore@cl.cam.ac.uk) is Conference on Multimedia
them, to be able to reason about what Professor of Computer Engineering in the Department of Retrieval,
Computer Science and Technology at the University of Ottawa, ON,
happens at other levels of abstraction, Cambridge, U.K. Sponsored: ACM/SIG,
and to understand the effects of com- Peter Sewell (Peter.Sewell@cl.cam.ac.uk) is Contact: Zhongfei (Mark) Zhang,
position. Managing overall complex- Professor of Computer Science in the Department of Email: zhongfei@
Computer Science and Technology at the University of cs.binghamton.edu
ity must fully capture information that Cambridge, U.K.
might be relevant for security analysis, Peter G. Neumann (neumann@csl.sri.com) is Chief
especially for entirely new classes of Scientist of the SRI International Computer Science Lab,
and moderator of the ACM Risks Forum.
vulnerabilities. The defensive battle
has only just begun. Copyright held by authors.

JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 27
V
viewpoints

DOI:10.1145/3325287 Peter J. Denning

The Profession of IT
An Interview with
David Brin on Resiliency
Many risks of catastrophic failures of critical infrastructures can be
significantly reduced by relatively simple measures to increase resiliency.

M
ANY PEOPLE TODAY are
concerned about critical
infrastructures such as
the electrical network, wa-
ter supplies, telephones,
transportation, and the Internet. These
nerve and bloodlines for society depend
on reliable computing, communications,
and electrical supply. What would happen
if a massive cyber attack or an electromag-
netic pulse, or other failure mode took
down the electric grid in a way that re-
quires many months or even years for re-
pair? What about a natural disaster such
as hurricane, wildfire, or earthquake that
disabled all cellphone communications
for an extended period?
David Brin, physicist and author,
has been worrying about these issues
for a long time and consults regularly
with companies and federal agen-
cies. He says there are many relatively
straightforward measures that might
greatly increase our resiliency—our
ability to bounce back from disaster. I
spoke with him about this.

Q: What is the difference between resil-


ience and anticipation? anticipation makes it hard for protec- Q: Let’s see what anticipation and re-
BRIN: Our prefrontal lobes help us tors to appreciate how we cope when silience look like for a common threat,
envision possible futures, anticipat- our best-laid plans fail, which they do, disruptive electrical outages. They can
ing threats and opportunities. Plan- sooner or later. be caused by storms, birds, squirrels,
ners and responders augment these Resilience is how we cope with un- power grid overload, or even preventive
organs with predictive models, intel- expected contingencies. It enables reduction of wildfire risk. Without pow-
PHOTO BY CH ERYL BRIGHA M

gathering, and big data, all in search us to roll with any blow and come up er, we cannot use our computers or ac-
of dangers to anticipate and counter fighting, keeping a surprise from be- cess our files stored in the Internet. Even
in advance. Citizens know little about ing lethal. It’s what worked on 9/11, our best disaster planning cannot fix the
how many bad things these protectors when all anticipatory protective mea- disruption if infrastructure damage is
have averted. But this specialization in sures failed. severe. Yet, communication is essential

28 COMMUNICATIO NS O F TH E ACM | J U NE 201 9 | VO L . 62 | NO. 6


viewpoints

for recovery. What can we do to preserve Q: What about solar on the southward
our ability to communicate? walls of buildings to power the build-
On 9/11, passengers aboard flight Alas, our comm ings? Some cities are already doing this.
UA93 demonstrated remarkable resil- systems are fragile Sure, south-facing walls are anoth-
ience when they self-organized to stop er place for photovoltaics. But there’s
the terrorist plot to use that plane as to failure in competition for that valuable real
a weapon against their country. If we any natural or estate—urban agriculture. Technolo-
want that kind of resilience to work on gies are cresting toward where future
a large scale, we need resilient commu- unnatural calamity. cities may require new buildings to
nications. Alas, our comm systems are recycle their organic waste through
fragile to failure in any natural or un- vertical farms that purify water while
natural calamity. One step toward resil- generating either industrial algae or
ience would be a backup peer-to-peer else much of the food needed by a me-
(P2P) text-passing capability for when business solar system when the electri- tropolis. With so much of the world’s
phones can’t link to a cellular tower. cal utility has blacked out. The purpose population going urban, no technol-
Texts would get passed from phone to is to prevent spurious home-generat- ogy could make a bigger difference.
phone via well-understood methods of ed voltages from endangering repair The pieces are coming together.
packet switching until they encounter linemen. This is a lame excuse for an What’s lacking is a sense of urgency.
a working node and get dropped into insane situation. Simply replace that Pilot programs and tax incentives
the network. Qualcomm already has cutoff switch with one that would still should encourage new tall buildings
this capability built into their chips! block backflow into the grid, but that to utilize their southward faces, nur-
But cellular providers refuse to turn it feeds from the solar inverter to just two turing this stabilizing trend during
on. That’s shortsighted, since it would or three outlets inside the home, run- the coming decade.
be good business too, expanding text ning the fridge, some rechargers, and
coverage zones and opening new rev- possibly satellite coms. Just changing Q: You’ve also spoken about apps sys-
enue streams. Even in the worst na- over to that switch would generate ar- tems that turn your smartphone into
tional disaster, we’d have a 1940s-level chipelagos of autonomous, resilient an intelligent sensor. Can you say how
telegraphy system all across the nation, civilization spread across every neigh- this supports resiliency?
and pretty much around the world. borhood in America, even in the very Cellphones already have powerful
All it would take to fix this is a small worst case. A new rule requiring such cameras, many with infrared capabil-
change of regulation. Five sentences switches, and fostering retrofitting, ity. Soon will come spectrum-analysis
requiring the cell-cos to turn this on would fit on less than a page. apps, letting citizens do local spot
whenever a phone doesn’t sense a Across the next decade, more solar checks on chemical spills or environ-
tower. (And charge a small fee for P2P systems will come with battery storage. mental problems, and feeding the
texts.) Doing so might let us restore But this reform would help us bridge results to governments or NGOs for
communications within an hour rather the next 10 years. modeling in real time. The Tricorder X
than months. Prize showed how just a few add-on de-
Many efforts have been made to Q: What about protection against elec- vices can turn a phone into a medical
empower folks with ad hoc mesh net- tromagnetic pulse disruption? appraisal device, like Dr. McCoy had in
works, via Bluetooth, Wi-Fi webs, and Much has been written about danger “Star Trek.” Almost anyone could use
so on. None of these enticed more than from EMP—either attacks by hostile such apparatus in the field with little
a tiny user base—nothing like what’s powers or else the sort of natural disas- training. Take a few measurements,
needed for national resilience. ter we might experience if the Sun ever and a distant system advises you on
struck us head-on with a coronal mass corrective actions.
Q: It appears that solar power for ejection, commonly called a solar flare. Infrared sensors, accelerometers, and
homes and offices is at a tipping point These CMEs happen often, peaking chemical sensors could provide a full
as more people find it cheaper than every 11 years. We’ve been lucky as the array of environmental awareness sys-
the power grid. Localized solar power worst ones have missed Earth. But some tems by turning citizen cellphones into
should also bring new benefits such as space probes have been taken out by di- nodes of an instant awareness network.
ability to maintain minimum electrical rect hits and a bulls-eye is inevitable. (I describe this in my novel Existence.)
function at home during a blackout. Is The EMP threat was recognized over Such a mesh is already of interest to
independence from the electrical grid 30 years ago. We could have incentiv- national authorities. But the empha-
good for resilience? ized gradual development of shielded sis has been hierarchical—authori-
It would be. One can envision a mil- and breakered chipsets, including ties send public reports down to citi-
lion solar-roofed homes and business- those in civilian electronics. Adoption zens after gathering and interpreting
es serving as islands of light for their could have been stimulated with a tax data flowing upward. The hierarchical
neighborhoods, in any emergency. But of a penny per non-compliant device, mind-set comes naturally when you
there’s a catch. Under current regula- with foreseen ramp-up. By now we’d are an authority with protective duties.
tions, almost all U.S. solar roofs have be EMP resilient, instead of fragile hos- But this can blind even sincere public
a switch that shuts down the home or tages either to enemies or to fate. servants to one of our great strengths—

JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 29
viewpoints

the ability of average citizens to self-or- 5,000 amateurs’ backyards, spread hoods in SUVs stealing stuff and espe-
ganize laterally. around the world. As Earth rotates, cially food, with no police to stop them.
Use your imagination. The great- these backyard stations would sweep I well-understand this worry! I’ve
est long-term advantage of our kind of the sky in overlapping swathes, sift- written collapse-of-civilization tales.
society is that lateral citizen networks, ing for anomalous signals, but also (One of them, The Postman, was filmed
while occasionally inconvenient to detecting almost anything interesting by Kevin Costner.) Hollywood pres-
public servants, aren’t any kind of mac- that happens up there. Argus failed ents so many apocalyptic scenarios,
ro-threat, but will make civilization earlier because of the complexity we tend to assume we live on a fragile
perform better. This is in contrast to and expense of racks of equipment. edge of collapse. But Rebecca Solnit’s
despotic regimes, for whom such citi- Today—with a small up-front invest- book, A Paradise Built In Hell, shows
zen empowerment would be lethal. ment by some mere-millionaire—we decisively that average citizens—
could offer a small box for a couple of whether liberal or conservative—are
Q: Some of your proposals are less fa- hundred bucks that could be latched actually pretty tough and dynamic.
miliar. You have spoken of “all sky to an old TV dish-antenna, then Wi- They quickly self-organize to help
awareness.” What is that and how does Fi linked via the owner’s home. The their neighbors. A quarter or more of
it improve resiliency? dish—plus a small optical detector— citizens will almost always run toward
Defense and intelligence folks could report detections in real time whatever the problem is. Take citizen
know we need better 24/7 omni-aware- and any pair or trio that correlate response on 9/11, or when disasters
ness of land, sea, and air. Major efforts would then trigger a look by higher- hit their neighborhoods.
involve protective services and space level, aimable devices. If “affluent neighborhoods” want to
assets. When the Large Synoptic Tele- Sure, most of the participants be safe, there’s one method that works
scope comes online in Chile, we’ll would think of their backyard SETI over the long run … don’t alienate the
find 100 times as many asteroids that stations as helping sift the sky for poor and middle class and ensure that
could threaten our planet, or like the aliens. So? As a side benefit, we’d the vast majority identify as members
one that broke 10,000 windows in Che- become hundreds of times better at of the same overall tribe. As neighbors,
lyabinsk. Closer to home, dangerous detecting almost any transient phe- we’ll come to your defense.
space debris should be tracked round nomenon overhead, improving both
the globe. anticipation and resilience. Q: Anything to mitigate cyber attacks,
Similar technology could improve I can go on with a much longer list including phishing and massive iden-
air safety and impede smugglers by of unconventional and generally very tity theft?
tracking both legal and illicit air traf- inexpensive ways that very simple regu- Sincere people across the spectrum
fic. For example, the cell networks I latory or incentive actions might trans- are right to worry about companies
mentioned earlier could detect and form national resilience, making soci- and governments collecting massive
triangulate aircraft engine sounds ety more robust to withstand shocks amounts of personal data on citizens:
for comparison to an ongoing data- across the decades ahead. from the ways they use their smart-
base, especially at low altitudes where phones, to always-on mics at home and
drug smugglers and human traffick- Q: What about civil unrest or lawless- office (for example, Alexa). Phishing is
ers operate, or where terrorists might ness if the disaster takes out or over- another example where crooks use al-
attempt an attack, or detecting the whelms local law enforcement? Easy to ready open knowledge about you to lure
path of airliners that stray, like Ma- see gangs roaming affluent neighbor- you into fatal online mistakes. We all
laysian Air flight 370. Imagine those fret about disparities of power that may
in peripheries like Canada, Alaska, or lead to the “telescreen” in George Or-
nearby waters automatically report- Sincere people well’s Nineteen Eighty-Four. From facial
ing sonic booms. Among myriad more recognition to video fakery to brainwave
mundane uses, these might perhaps across the spectrum interpretation and lie detectors, if these
localize incoming hypersonic weap- are right to worry techs are monopolized by one elite or
ons, of the kind announced recently another, we may get Big Brother forever.
by Russian President Vladimir Putin. about companies There are forces in the world who are
Sound implausible? In Decem- and governments eager for this. China’s “social credit”
ber 2018, a loose network of amateur system aims to the masses to enforce
‘plane-spotters’ managed to track Air collecting massive conformity on one another.
Force One visually, during President amounts of personal In the West, most people are right to
Trump’s top-secret Christmas dash to find this prospect terrifying. The reflex
a U.S. air base in Iraq. A U.K. photogra- data on citizens. in response is to say: “let’s ban or re-
pher used these clues to snap the un- strict this new kind of light.” And that
mistakable, blue-and-white 747 jetting is the worst possible prescription. The
far overhead. elites we fear will only gain great power
Another method: revive the SETI if they can operate in secret, enhanc-
League’s Project Argus, aiming to es- ing that disparity, because we won’t be
tablish radio and optical detectors in able to look back.

30 COMM UNICATIO NS O F THE AC M | J U NE 201 9 | VO L . 62 | NO. 6


viewpoints

Consider. It matters much less slowly woven into civilian electronics


what elites of all kinds know about for decades. And here’s a thought—
you than what they can do to you. And In an era of high tech maybe it has been! After all, if we had
the only thing that deters the latter is and lightning reaction truly savvy leaders, they would want to
what we know about them. Denying slide this protection into place as qui-
elites the power to see has never hap- times, we must etly as possible. Why? Because there
pened (for long) anywhere in the his- rely on a highly is a critical vulnerability window,
tory of the world. But denying them during which those who are thinking
the ability to harm citizens is some- professional cadre about hitting us might strike if they
thing we’ve (imperfectly) accom- of protectors. see the chance slipping away. History
plished for 200 years. We’ve done it shows that such transitions can be
by insisting that we get to see, too. If dangerous, as revealed by John F. Ken-
not as individuals, then via the NGOs nedy in While England Slept.
we hire to look for us. Some bright folks are paying at-
As I appraise in The Transparent So- tention. Elon Musk told me he would
ciety, the answer is more light, not less, fix the solar cutoff problem with his
for common citizens to be empowered about losing. Not by hiding but by as- Power Wall storage system, and that
by technology to take up much of the sertively demanding to see. What I do is the answer … in a decade. A $200
burden of supervising and arguing ask is that you squint and look ahead switch would still be worthwhile, till
and applying accountability. The more 50 or 100, and ask what is our baseline then. Another zillionaire expressed
we can see the less the bad groups can victory condition? interest in the all-sky awareness proj-
hide. If we do this, we’ll not only be re- Every enemy of this enlighten- ect, but more for its contribution to
silient, we’ll never have Big Brother. ment, individualist, open-society SETI than national or world security.
The answer to phishing, ID theft, experiment—every lethal foe—is Membership in CERT—Community
etc., is the same as always—to catch mortally allergic to light. They suffer Emergency Response Teams—rises
and deter villains, by ending most when their plans, methods, agents, every year. And so it goes. Just way too
shadows for roaches to hide in. and resources are revealed. In con- slowly.
trast, we are at worst inconvenienced What truly matters is the very con-
Q: We don’t know how to do this be- and—as shown by the Snowden and cept of resilience, which worked so
cause the Internet itself is baked in a WikiLeaks affairs—even prodded to well on 9/11 and at every turn of Ameri-
cloak of anonymity. We are not going improve a bit. If, say in 50 years, there can history. The U.S. Army, till just one
to redesign the Internet protocols any- is worldwide transparency of owner- generation ago, always based its plan-
time soon. We need more than light. ship and power and action, then we ning on vast pools of talented, healthy
Isn’t the solution good locks on our win. We—a humanity that is inquisi- volunteers rushing in to fill the thin
databases? tive, confident, individualistic, and blue line. Sure, in an era of high tech
Sorry, show me one time when free—simply win. and lightning reaction times, we must
“good locks” worked for very long. Ev- rely on a highly professional cadre of
ery week, some previously “for sure” Q: These resiliency proposals all sound protectors. But the worst thing they
database is raided or leaks. All that so reasonable. Why have they not been could do is to declare “Count on us …
needs happen is for any lock to fail implemented? and only on us.”
once, at all, via code-breaking or hack- A cynic would answer that there’s No. We love you and thank you for
ing or phishing or human error, and not much economic-constituency be- your service. But a time will come when
the information is loose, infinitely hind resilience. No big-ticket orders. you will fail. And when that happens, it
copyable. If you base your sense of How much money is to be made from will be our turn—citizens—to step up.
safety on secrecy, it will be impossible a slightly costlier home-solar cutoff Help us to prepare, and we won’t let
to verify what others don’t know. switch that would feed rooftop en- you down.
Look, I’m not saying that there ergy to three outlets in a million U.S.
should be no secrets or privacy! Our homes? I spoke about backup peer- David Brin (http://www.davidbrin.com ) is an astrophysicist
whose international best-selling novels include
skilled protectors need tactical secrecy to-peer texting at a defense industry The Postman, Earth, and Existence. He serves on advisory
to do their jobs. But smaller volumes boards (for example, NASA’s Innovative and Advanced
conference where a Verizon vice- Concepts program or NIAC) and speaks or consults on
and perimeters are easier to defend president in attendance went abso- a wide range of topics including AI, SETI, privacy, and
national security. His nonfiction book about the information
and seal. It has always been U.S. policy lutely livid. Qualcomm tried subse- age—The Transparent Society—won the Freedom of Speech
that secrecy should bear some burden quently to get them—and AT&T—to Award of the American Library Association.
of justification and—eventually—a try some regional experiments; Peter J. Denning (pjd@nps.edu) is Distinguished
Professor of Computer Science and Director of the
time limit. might P2P texting might actually Cebrowski Institute for information innovation at the
This isn’t the time or place to ar- turn a profit? Alas, no one wants to Naval Postgraduate School in Monterey, CA, is Editor
of ACM Ubiquity, and is a past president of ACM. The
gue the point. Alas, the reflex to seek risk disruption, even though this author’s views expressed here are not necessarily those of
safety in shadows is so strong that one function could knit our entire his employer or the U.S. federal government.
folks forget how we got the very free- continent together, in a crisis.
doms, wealth, and justice we worry EMP resistance should have been Copyright held by authors.

JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 31
V
viewpoints

DOI:10.1145/3322933 Thomas Pasquier, David Eyers, and Jean Bacon

Viewpoint
Personal Data and
the Internet of Things
It is time to care about digital provenance.

W
E HAVE ALL read market
predictions describing
billions of devices and
the hundreds of billions
dollars in profit that the
Internet of Things (IoT) promises.a Secu-
rity and the challenges it represents27 are
often highlighted as major issues for IoT,
alongside scalability and standardiza-
tion. In 2017, FBI Director James Comey
warned, during a senate hearing, of the
threat represented by a botnet taking
control of devices owned by unsuspect-
ing users. Such a botnet can seize con-
trol of devices ranging from connected
dishwashers,b to smart home cameras
and connected toys, not only using
them as a platform to launch cyber-at-
tacks, but also potentially harvesting As concerns grow, legislators across data controller must provide means for
the data such devices collect. the world are taking action in order to end users to determine whether their
In addition to concerns about cyber- protect the public. For example, the re- data is properly handled and means to
security, corporate usage of personal cent EU General Data Protection Regu- effect their rights. Overall, there must
data has seen increased public scrutiny. lation (GDPR) that took effect in May be mechanisms to determine what
A recent focus of concern has been con- 2018,e and the forthcoming ePrivacy data is processed, how, why, and where.
nected home hubs (such as Amazon Alexa Regulationf place strong responsibility Such concerns have drawn re-
and Google Home).c Articles on the topic on data controllers to protect personal searchers to look at means to develop
discussed whether conversations were be- data, and to notify users of security more accountable and transparent sys-
ing constantly recorded and if so, where breaches. The EU commission defines tems.10,24 The problem has also been
those records went. Similarly, the Univer- a Data Controller as the party that de- clearly highlighted by the EU Data
sity of Rennes faced a public backlash af- termines the purposes for which, and Protection Working Party: “As a result
ter revealing its plan to deploy smart-beds the means by which, personal data is of the need to provide pervasive ser-
in its accommodation to detect “abnor- processed (why and how the data is pro- vices in an unobtrusive manner, users
mal” usage patterns.d A clear question cessed). EU regulations further impose might in practice find themselves un-
emerges from IoT-related fears: “How constraints on EU citizens’ data pro- der third-party monitoring. This may
and why is my data being used?” cessing based on location and data type result in situations where the user can
IMAGE BY KOSTENKO M AXIM

(that is, “special category” data falls lose all control on the dissemination
under more stringent constraints). The of his/her data, depending on whether
a See https://bit.ly/2JNx0LZ
b See https://bit.ly/2JIOidc
or not the collection and processing of
c See https://bit.ly/2gY9qKG e See https://bit.ly/2lSJQfO this data will be made in a transparent
d See https://lemde.fr/2HLvEQb f See https://bit.ly/2j4AwzT manner or not.”

32 COMMUNICATIO NS O F TH E AC M | J U NE 201 9 | VO L . 62 | NO. 6


viewpoints

Modern computing systems con- and persistent data items, transforma- es this complexity, with potentially ad
tain many components that operate tions applied to those states, and per- hoc and unforeseen interactions be-
as black boxes; they accept inputs and sons (legal or natural) responsible for tween devices and services on top of
generate outputs but do not disclose data and transformations (generally the complex cloud and edge computing
their internal working. Beyond privacy referred to as entities, activities, and infrastructure most IoT services rely on.
concerns, this also limits the ability to agents respectively). The edges repre- One answer to this problem is to
detect cyber-attacks, or more generally sent dependencies between these enti- build applications in “silos” where the
to understand cyber-behavior. Because ties. The analysis of such a graph allows involved parties are known in advance,
of these concerns DARPA, in the U.S., us to understand where, when, how, by but as a side-effect locking-in devices
launched the Transparent Computing whom, and why data has been used.7,9 and services to a single company (for
projectg to explore means to build more An outcome of research on prove- example, the competing smart-home
transparent systems through the use of nance in the cybersecurity space is the offerings by leading technology com-
digital provenance with the particular understanding that the capture mecha- panies). This is far from the IoT vision
aim of identifying advanced persistent nism must provide guarantees of com- of a connected environment, but most
threats. While DARPA’s work is a good pleteness (all events in the system can existing products fall into this catego-
start, we believe there is an urgent need be seen), accuracy (the record is faith- ry. There are obviously major business
to reach much further. In the remain- ful to events) and a well-defined, trust- considerations behind this model, and
der of this Viewpoint, we explore how ed computing base (the threat model is it should be noted that the EU GDPR
provenance can be an answer to some clearly expressed).22 Otherwise, attacks mandates for some form of interoper-
IoT concerns and the challenges faced on the system may be undetected, dis- ability (although it is yet unclear how it
to deploy provenance techniques. simulated by the attacker, or misattrib- should be interpreted12).
uted. We argue that in a highly ad hoc An alternative to such “lock-in”
Digital Provenance and interoperable environment with would be to make devices’ consump-
There is a growing clamor for more mutually untrusted parties, the prove- tion of data transparent and account-
transparency, but straightforward, nance used to empower end users with able. If data is exchanged across de-
widespread technical solutions have control and understanding over data vices, the concerned user should be
yet to emerge. Typical software log re- usage requires similar properties. able to audit its usage. However, in an
cords often prove insufficient to audit environment where arbitrary devices
complex distributed systems as they Who to Trust? could interact (although it must be
fail to capture the complex causality In the IoT environment the number of remembered that EU GDPR requires
relationships between events. Digital involved stakeholders has the potential explicit and informed user consent),
provenance8 is an alternative means to explode exponentially. Traditionally, how can trust be established in the au-
to record system events. Digital prove- a company managed its own server in- dit record? This requires an in-depth
nance is the record of information flow frastructure, maybe with the help of a rethinking of how IoT platforms are
within a computer system in order to subcontractor. The cloud computing designed, potentially exploring the
assess the origin of data (for example, paradigm further increased complex- security-by-design approach based on
its quality or its validity). ity with the involvement of cloud ser- hardware roots of trust13 to provide
The concept first emerged in the da- vice providers (sometimes stacked, for trusted digital enclaves in which be-
tabase research community as a means example, Heroku PaaS on top of the havior can be audited. Some form of
to explain the response to a given Amazon IaaS cloud service), third-party “accountability-by-design” principle
query.16 Provenance research later ex- service providers (for example, Cloud- should also be encouraged, where
panded to address issues of scientific MQTT) and other tenants sharing the transparency and the implementation
reproducibility, notably by providing infrastructure. The IoT further increas- of a trustworthy audit mechanism is a
mechanisms to reconstitute compu- core concern in product design.
tational environments from formal Such solutions have been explored in
records of scientific computations.23 Building transparent the provenance space, for example, by
More recently, provenance has been ex- leveraging SGX properties to provide a
plored within the cybersecurity commu- and auditable strong guarantee of the integrity of the
nity25 as a means to explain intrusions18 systems may be provenance record.4 Similarly, remote
or more recently to detect them.14 attestation techniques leveraging TPM
Provenance records are represented one of the greatest hardware have been proposed6 to guar-
as a directed acyclic graph that shows software engineering antee the integrity of the capture mech-
causality relationships between the anism. However, how to provide such
states of the objects that compose a challenges of the guarantees in an IoT environment, where
complex system. As a consequence, it coming decade. such hardware features may not be avail-
is compatible with automated mathe- able, is a relatively unexplored topic.
matical reasoning. In such a graph, the
vertices represent the state of transient Where Does the Audit Live?
The fully realized IoT vision is of vast
g See https://bit.ly/2Uf5bQY distributed and decentralized systems.

JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 33
viewpoints

If we assume trustworthy provenance and represent the outcome of complex 6. Bates, A.M. et al. Trustworthy whole-system
provenance for the Linux kernel. In Proceedings of the
capture is achievable, the issue of guar- computational workflow.1 USENIX Security Symposium (2015) 319–334.
anteeing that the provenance record can Provenance visualization has been 7. Buneman, P. et al. Why and where: A characterization
of data provenance. In Proceedings of the International
be audited remains. If you are to audit an active research topic for over a de- Conference on Database Theory. Springer, 2001, 316–330.
the processing of personal data, guaran- cade, yet no fully satisfactory solution 8. Carata, L. et al. A primer on provenance. Commun.
ACM 57, 5 (May 2014), 52–60.
tees about the integrity and availability has been proposed. The simplest possi- 9. Cheney, J. et al. Provenance in databases: Why, how,
of the provenance record must exist. If ble visualization is to render the graph, and where. Foundations and Trends in Databases 1, 4
(2009), 379–474.
you agreed to share your daily activity however beyond trivially simple graphs 10. Crabtree, A. et al. Building accountability into the
for research, the activities of insurance such a representation is too complex Internet of Things: The IoT databox model. Journal of
Reliable Intelligent Environments (2018).
companies scraping your data for pos- and dense to be easily understood, even 11. Davidson, S. et al. Provenance views for module
privacy. In Proceedings of the Thirtieth ACM SIGMOD-
sible health risks must not be able to by experts. We go further and suggest SIGACT-SIGART Symposium on Principles of
masquerade as benign research use, that how interpretable such informa- Database Systems. ACM, 2011, 175–186.
12. De Hert, P. et al. The right to data portability in the GDPR:
nor should data collection for political tion is for end users also depends on Towards user-centric interoperability of digital services.
purposes be able to pass as harmless en- educational background, socioeco- Computer Law & Security Review. Elsevier, (2017).
13. Eldefrawy, K. et al. SMART: Secure and minimal
tertainment, as in the Cambridge Ana- nomic environment, and culture. architecture for (establishing dynamic) root of
lytica scandal.h Similarly, the availability In order to make the accountability trust. In Network and Distributed System Security
Symposium 12 (2012), 1–15.
(durability) of the audit record must be and transparency of IoT platforms effec- 14. Han, X. et al. FRAPpuccino: Fault-detection through
guaranteed. There is no point to an au- tive, a better communication medium Runtime Analysis of Provenance. In Proceedings
of the Workshop on Hot Topics in Cloud Computing
dit record if it can simply be deleted. must be provided. An approach often (HotCloud’17). USENIX (2017).
Further, Moyer et al. evaluated the taken is to analyze motifs in the graph 15. Hasan, R. et al. The case of the fake Picasso:
Preventing history forgery with secure provenance.
storage requirements of provenance to extract high-level abstractions (for In Proceedings of the Conference on File and Storage
when used for security purposes in rela- example, Missier et al.20), meaningful to Technologies (FAST’09), (2009), 1–14.
16. Herschel, M. et al. A survey on provenance: What for?
tively modest distributed systems.21 In the average end user. In recent work, it What form? What from? The VLDB Journal—The
such a context, several thousands of was proposed to represent such a high- International Journal on Very Large Data Bases 26, 6
(2017), 881–906.
graph elements can be generated per level abstraction as a comic strip.26 17. Hossain, M.N. et al. Dependence-preserving data
compaction for scalable forensic analysis. In
second and per machine, resulting in Proceedings of the USENIX Security Symposium.
a graph containing billions of nodes to We Need to Care About 18. King, S.T. and Chen, P.M. Backtracking intrusions. ACM
SIGOPS Operating Systems Review 37, 5 (May 2003).
represent system execution over several Digital Provenance 19. Liang, X. et al. Provchain: A blockchain-based data
months. It is unclear how some past re- Building transparent and auditable sys- provenance architecture in cloud environment with
enhanced privacy and availability. In International
search outcomes, for example, detection tems may be one of the greatest software Symposium on Cluster, Cloud and Grid Computing.
of suspicious behavior,2 privacy-aware engineering challenges of the coming IEEE/ACM, (2017), 468–477.
20. Missier, P. et al. ProvAbs: Model, policy, and tooling
provenance11 or provenance integrity,15 decade. As a consequence, digital prove- for abstracting PROV graphs. In Proceedings of the
scale to very large graphs, as such con- nance and its application to cybersecuri- International Provenance and Annotation Workshop.
Springer, 2017, 3–15.
cerns were not evaluated. Similarly, ty and the management of personal data 21. Moyer, T. and Gadepally, V. High-throughput ingest
while blockchain is heralded19 as an in- has become a hot research topic. We of data provenance records into Accumulo. In
Proceedings of the High Performance Extreme
tegrity-preserving means to store prov- have highlighted key active areas of re- Computing Conference (HPEC), IEEE, 2016, 1–6.
enance, it is unclear how well it could ex- search and their associated challenges. 22. Pasquier, T. et al. Runtime analysis of whole system
provenance. In Proceedings of the Conference on
pand to such scale. Several options have It is fundamental for industry practitio- Computer and Communications Security (CCS’18).
been explored to reduce graph size, such ners to understand the threat posed by ACM, 2018.
23. Pasquier, T. et al. If these data could talk. Scientific
as identifying and tracking only sensi- the black-box nature of the IoT, the po- Data 4 (2017), http://www.nature.com/sdata2017114.
24. Pasquier, T. et al. Data provenance to audit compliance
tive data objects5 or performing proper- tential solutions, and the challenges to a with privacy policy in the Internet of Things. Personal
ty-preserving graph compression17 how- practical deployment of those solutions. and Ubiquitous Computing (2018), 333–344.
25. Pohly, D.J. et al. Hi-Fi: Collecting high-fidelity whole-
ever none has yet adequately addressed Accountability-by-design must become system provenance. In Proceedings of the 28th Annual
the scalability challenge. a core objective of IoT platforms. Computer Security Applications Conference. ACM,
2012, 259–268.
26. Schreiber, A. and Struminski, R. Tracing personal data
How to Communicate Information? References
using comics. In Proceedings of the International
Conference on Universal Access in Human-Computer
Means must be developed to commu- 1. Acar, U. et al. A graph model of data and workflow Interaction. Springer, 2017, 444–455.
provenance. In Proceedings of the TAPP’10 Second
nicate about data usage, but also about Conference on Theory and Practice of Provenance,
27. Singh, J. et al. Twenty security considerations for
cloud-supported Internet of Things. IEEE Internet of
the risks of inference from the data. USENIX, 2010. Things Journal 3, 3 (Mar. 2016), 269–284.
2. Allen, M.D. et al. Provenance for collaboration:
Not only must the nature of the data be Detecting suspicious behaviors and assessing trust in
considered, but also other properties information. In Proceedings of the 7th International
Thomas Pasquier (http://tfjmp.org) is a Lecturer
Conference on Collaborative Computing: Networking,
such as the frequency of capture.3 For Applications and Worksharing (CollaborateCom). (Assistant Professor) at the University of Bristol’s Cyber
Security Group, and a visiting scholar at the University of
example, a 100Hz smart-meter read- IEEE, 2011, 342–351.
Cambridge, U.K.
3. Amar, Y. et al. An information theoretic approach to
ing can in some cases indicate what time-series data privacy. In Proceedings of the 1st
David Eyers (https://www.cs.otago.ac.nz/staff/David_Eyers)
television channel is currently being Workshop on Privacy by Design in Distributed Systems.
is an Associate Professor in the Department of Computer
ACM, (2018), 3.
watched; even a daily average reading Science at the University of Otago, New Zealand.
4. Balakrishnan, N. et al. Non-repudiable disk I/O in
could inform about occupancy. Here, untrusted kernels. In Proceedings of the 8th Asia- Jean Bacon (http://www.cl.cam.ac.uk/~jmb25/) is
Pacific Workshop on Systems. ACM, 2017, 24. Professor Emerita of Distributed Systems at the University
it is important to be able to explore 5. Bates, A. et al. Take only what you need: Leveraging of Cambridge, U.K.
mandatory access control policy to reduce provenance
storage costs. In Proceedings of the Conference on
h See https://nyti.ms/2HH74vA Theory and Practice of Provenance (2015), USENIX, 7–7. Copyright held by authors.

34 COMMUNICATIO NS O F TH E ACM | J U NE 201 9 | VO L . 62 | NO. 6


Check out the acmqueue app
FREE TO ACM MEMBERS
acmqueue is ACM’s magazine by and for practitioners,
bridging the gap between academics and practitioners
of the art of computer science. For more than a decade
acmqueue has provided unique perspectives on how
current and emerging technologies are being applied
in the field, and has evolved into an interactive,
socially networked, electronic magazine.

Broaden your knowledge with technical articles


focusing on today’s problems affecting CS in
practice, video interviews, roundtables, case studies,
and lively columns.

Keep up with this fast-paced world


on the go. Download the mobile app.

Desktop digital edition also available at queue.acm.org.


Bimonthly issues free to ACM Professional Members.
Annual subscription $19.99 for nonmembers.
practice
DOI:10.1145/ 3316772


Article development led by
queue.acm.org

A collaborative approach to reclaiming


memory in heterogeneous software systems.
BY ULAN DEGENBAEV, MICHAEL LIPPAUTZ, AND HANNES PAYER

Garbage
Collection
as a Joint
Venture
MA N Y POPU LAR PRO G RAMMI NG languages are executed and premature freeing results in dan-
gling pointers.
on top of virtual machines (VMs) that provide Virtual machines for managed lan-
critical infrastructure such as automated memory guages may be embedded into larger
management using garbage collection. Examples software systems that are implemented
in a different, sometimes unmanaged,
include dynamically typed programming languages programming language, where pro-
such as JavaScript and Python, as well as static ones grammers are responsible for releasing
memory that is no longer needed. An
like Java and C#. For such languages the garbage example of such a heterogenous soft-
collector periodically traces through objects on the ware system is Google’s Chrome Web
application heap to determine which objects are live browser where the high-performance
V8 JavaScript VM (https://v8.dev/) is em-
and should be kept or dead and can be reclaimed. bedded in the Blink rendering engine
The garbage collector is said to manage the that is in charge of rendering a website.
Blink renders these pages by interpret-
application memory, which means the programming ing the document object model (DOM;
language is managed. The main advantage of https://www.w3.org/TR/WD-DOM/intro-
managed languages is that developers do not have duction.html) of a website, which is a
cross-platform language-independent
to reason about object lifetimes and free objects representation of the tree structure de-
manually. Forgetting to free objects leaks memory, fined through HTML.

36 COMM UNICATIO NS O F THE ACM | J U NE 201 9 | VO L . 62 | NO. 6


Since Blink is written in C++, it im- jects are identified; sweeping, where gle garbage-collection system, where
plements an abstract DOM represent- dead objects are released; and compac- objects form groups of strongly con-
ing HTML documents as C++ objects. tion, where live objects are relocated to nected components that are otherwise
The DOM C++ objects are wrapped reduce memory fragmentation. During unreachable from the live object graph.
and exposed as objects to JavaScript, marking, the garbage collector finds all Cycles require either manual break-
which allows scripts to manipulate objects reachable from a defined set of ing through the use of weak references
Web page content directly by modify- root references, conceptually travers- or the use of some managed system
ing the DOM objects. The C++ objects ing an object graph, where the nodes of able to infer liveness by inspecting the
are called wrappables, their JavaScript the graph are objects and the edges are system as a whole. Manually breaking
counterparts wrappers, and the refer- fields of objects. a cycle is not always an option because
ences connecting these objects cross- Cross-component references express the semantics of the involved objects
component references. Even though liveness over component boundaries may require all their referents to stay
C++ is an unmanaged language, Blink and have to be modeled explicitly in the alive through strong references. An-
has its own garbage collector for graph. The simplest way to manage other option would be to restrict the in-
DOM C++ objects. Cross-component those references is by treating them volved components in such a way that
memory management then deals with as roots into the corresponding com- cycles cannot be constructed. Note that
reclaiming memory in such heteroge- ponent. In other words, references in the case of Chrome and the Web this
neous environments. from Blink to V8 would be treated as is not always possible, as shown later.
IMAGE BY WACOM KA

V8 and Blink use mark-sweep-com- roots in V8 and vice versa. This creates While the cycle problem can be
pact garbage collectors where a single the problem of reference cycles across avoided by unifying the memory-man-
garbage-collection cycle consists of components, which is analogous to agement systems of two components,
three phases: marking, where live ob- regular reference cycles1 within a sin- it may still be desirable to manage the

JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 37
practice

memory of the two components inde- across components To highlight the


pendently to preserve separation of problems of leaks and dangling point-
concerns, since it is simpler to reuse a ers, it is useful to look at a concrete
component in another system if there example of JavaScript code and how it
are fewer dependencies. For example,
V8 is used not only in Chrome, but also Cross-component can be used to create dynamic content
that changes over time.
in the Node.js server-side runtime, tracing enables Figure 1 shows an example that
creates a temporary object, a loading
efficient, effective,
making it undesirable to add Blink-
specific knowledge to V8. bar (loadingBar), that is then re-
Assuming the components cannot
be unified, the cross-component refer-
and safe garbage placed by actual content (content)
asynchronously built and swapped in
ence cycles can lead to either memory collection across as soon as it is ready. Note that access-
leaks when graphs involving cycles can-
not be reclaimed by the components’
component ing the document element or the body
element, or creating the div elements
garbage collectors, heavily impacting boundaries. results in pairs of objects in their re-
browser performance, or premature spective worlds that hold references
collection of objects resulting in use- to each other. While the program it-
after-free security vulnerabilities and self is written in JavaScript, property
program crashes that put users at risk. look-ups to, for example, the body
This article describes an approach element and calls to DOM methods
called cross-component tracing (CCT),3 appendChild and replaceChild
which is implemented in V8 and Blink are forwarded to their corresponding
to solve the problem of memory man- C++ implementations in Blink. Regu-
agement across component bound- lar JavaScript access, such as setting
aries. Cross-component tracing also a parent property, is carried out by
integrates nicely with existing tooling V8 on its own objects. It is this seam-
infrastructure and improves the de- less integration of JavaScript and the
bugging capabilities of Chrome Dev- DOM that allows developers to create
Tools (https://developers.google.com/ rich Web applications. At the same
web/tools/chrome-devtools/). time, this concept allows the creation
of arbitrary object graphs across com-
Separate Worlds for ponent boundaries.
DOM and JavaScript Figure 2 shows a simplified ver-
As mentioned, Chrome encodes the sion of the object graph created by
DOM in C++ wrappable objects, and the example, where JavaScript objects
most functionality specified in the on the left are connected to their C++
HTML standard is provided as C++ counterparts in the DOM on the right.
code. In contrast, JavaScript is imple- Java-Script objects, such as the body
mented within V8 using a custom ob- and div elements, have hardly any ref-
ject model that is incompatible with erences in JavaScript but are mostly
C++. When JavaScript application code used to refer to their corresponding
accesses properties of JavaScript DOM C++ objects. It is thus crucial to define
wrapper objects, V8 invokes C++ call- the semantics of cross-component
backs in Blink, which make changes to references for the component-local
the underlying C++ DOM objects. Con- garbage collectors to allow collection
versely, Blink objects can also directly of these objects. For example, treating
reference JavaScript objects and modi- incoming references from Blink into
fy those as needed. For example, Blink V8 as roots for the V8 garbage collec-
can bind fields of JavaScript objects to tor would always keep the loading-
C++ callbacks that can be used by other Bar object alive. Treating such refer-
JavaScript code. ences as uniformly weak would result
Both worlds—DOM and Java- in reclamation of the body and the div
Script—are managed by their own elements by the V8 garbage collector,
trace-based garbage collectors able which would leave behind dangling
to reclaim memory that is only transi- pointers for Blink.
tively rooted within their own heaps. Besides correctness, another chal-
What remains is defining how cross- lenge in such an entangled environ-
component references should be treat- ment is debuggability for developers.
ed by these garbage collectors to en- While the Web platform allows loose
able them to effectively collect garbage coupling of C++ and JavaScript under

38 COM MUNICATIO NS O F TH E AC M | J U NE 201 9 | VO L . 62 | NO. 6


practice

the hood, it is crucial that the APIs for be manually annotated with a method ing, which means that marking is di-
these abstractions are properly encap- that describes the body of the class, vided into steps during which objects
sulated for Web developers who use including any references to other man- are marked for only a small amount of
HTML and JavaScript, including pre- aged objects. Since Blink was already time (for example, 1ms).
venting memory leaks when properly garbage-collected before introducing The application is free to change ob-
used. To investigate memory leaks in CCT, only minor adjustments to this ject references between the steps. This
Web pages, developers need tools that method were required across the ren- means that the application may hide a
allow them to reason seamlessly about dering codebase. reference to an unmarked object in an
the connectivity of objects spanning Chrome strives to provide smooth already-marked object, which would
both V8 and Blink heaps. user experiences, updating the screen result in premature collection of a live
at 60fps (frames per second), leaving object. Incremental marking requires
Cross-Component Tracing V8 and Blink around 16.6 millisec- a garbage collector to keep the mark-
We propose CCT as a way to tackle the onds to render a frame. Since marking ing state consistent by preserving the
general problem of reference cycles large heaps may take hundreds of mil- strong tri-color-marking invariant.8
across component boundaries. For liseconds, both V8 and Blink employ This invariant states that fully marked
CCT, the garbage collectors of all in- a technique called incremental mark- objects are allowed to point only to
volved components are extended to
allow tracing into a different compo- Figure 1. JavaScript example interacting with the DOM.
nent, managing objects of potentially <!DOCTYPE html>
different programming languages. <html>
<body><script>
CCT uses the garbage collector of one function fetchContent(callback) {
component as the master tracer to com- // Emulate network request and content creation.
pute the full transitive closure of live setTimeout(callback, 1000);
}
objects to break cycles. function run() {
Other components assist by provid- const loadingBar = document.createElement(“div”);
document.body.appendChild(loadingBar);
ing a remote tracer that can traverse fetchContent(() => {
the objects of the component when const content = document.createElement(“div”);
document.body.replaceChild(content, loadingBar);
requested by the master tracer. The content.parent = document.body;
system can then be treated as one });
managed heap. As a consequence, }
document.addEventListener(“DOMContentLoaded”, run);
the simple algorithm of CCT can be </script></body>
extended to allow moving collectors </html>
and incremental or concurrent mark-
ing as needed by just following exist-
ing garbage collection principles.8 The Figure 2. Object graph spanning JavaScript and the DOM.
pseudocode of the master and remote
tracer algorithms is available in our full
V8 blink
research article.3
For Chrome we developed a version root
of cross-component tracing where the
master tracer for JavaScript objects
and the remote tracer for C++ objects document HTMLDocument
are provided by V8 and Blink, respec-
tively. This way V8 can trace through
the C++ DOM upon doing a garbage
collection, effectively breaking cycles body HTMLBodyElement
on the V8 and Blink boundary. In this
system, Blink garbage collections deal
with only the C++ objects and treat the
incoming cross-component references div
from V8 as roots. This way, subsequent content HTMLDivElement

invocations of V8’s and Blink’s garbage


collectors can reclaim cycles across the
component boundary.
div
The tracer in V8 makes use of the loadingBar HTMLDivElement
concept of hidden classes2 that de-
scribe the body of JavaScript objects to
find references to other objects, as well
as to Blink. The tracer in Blink requires
each garbage-collected C++ class to

JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 39
practice

objects that are also fully marked or rent marking on a background thread by V8’s root object are marked black.
stashed somewhere for processing. V8 this way while relying on incremental Subsequently, any unreachable objects
and Blink preserve the marking invari- tracing in Blink.5 (loadingBar, in this example) are re-
ant using a conservative Dijkstra-style To make this concrete, Figure 3 il- claimed by the garbage collector. Note
write barrier6 that ensures that writing lustrates CCT where V8 traces and that from V8’s point of view, there is no
a value into an object also marks the marks objects in JavaScript, as well d ifference between the div elements
value. In fact, V8 even provides concur- as C++. Objects transitively reachable content and loadingBar, and only
CCT makes it clear which object can
Figure 3. Cross-component garbage collection. be reclaimed by V8’s garbage collec-
tor. Once the unreachable V8 object is
gone, any subsequent garbage collec-
V8 blink
tions in Blink will not see a root for the
root corresponding HTMLDivElement and
reclaim the other half of the wrapper-
wrappable pair.
document HTMLDocument In Chrome, CCT replaced its prede-
cessor, called object grouping, in ver-
sion 57. Object grouping was based on
over-approximating liveness across
body HTMLBodyElement component boundaries by keeping
all wrappers and wrappables alive in
a given DOM tree as long as a single
parent
wrapper was held alive through Java-
div Script. This assumption was reason-
HTMLDivElement
content able at the time it was implemented,
when modification of the DOM from
wrappers occurred infrequently.
However, the over-approximation
div
loadingBar
HTMLDivElement had two major shortcomings: It kept
more memory alive than needed,
reclaimed on which in times of ever-growing Web
V8 garbage applications increased already strong
collection
memory pressure in the browser;
and, the original algorithm was not
designed for incremental processing,
Figure 4. Leaking the callback. which, compared with CCT, resulted
in longer garbage-collection pause
function fetchContent(callback) {
// Emulate network request and content creation.
times.
setTimeout(callback, 1000); Incremental CCT as implemented
fetchContent.internalState = callback; today in Chrome eliminates those
}
problems by providing a much bet-
ter approximation by computing live-
ness of objects through reachability
Figure 5. Retaining path of the leaking DIV element. and by enabling incremental process-
ing. The detailed performance analy-
sis can be found in the main research
paper.3 We are currently working on
concurrent marking of the Blink C++
heap and on integrating CCT into
such a scheme.

Debugging
Memory-leak bugs are a widespread
problem haunting Web applications
today.7 Powerful language constructs
such as closures make it easy for a Web
developer to accidentally extend the
lifetimes of JavaScript and DOM ob-
jects, resulting in higher memory usage
than necessary. As a concrete example,

40 COM MUNICATIO NS O F TH E ACM | J U NE 201 9 | VO L . 62 | NO. 6


practice

let’s assume that the fetchContent that such cycles are eventually col- that allows the systems on top to stay
function from Figure 1 keeps, perhaps lected. WebKit, the engine running as flexible as needed.
because of a bug, an internal reference inside Safari, uses reference counting CCT is implemented not only in
to the provided callback, as shown in for the C++ DOM with an additional Chrome, but also in other software sys-
Figure 4. system that computes liveness across tems that use V8 and Chrome, such as
Without knowing the implemen- the wrapper/wrappable boundary in the popular Opera Web browser and
tation of the fetchContent func- the final pause of a garbage-collection Electron. Cobalt, a high-performance,
tion, a Web developer observes that cycle. Unsurprisingly, all major brows- small-footprint platform providing a
the loadingBar element from the ers have mechanisms to deal with subset of HTML5, CSS, and JavaScript
previous example is not reclaimed by these kinds of cycles, as memory leaks used for embedded devices such as
the garbage collector. Can debugging in longer-running websites would oth- TVs, implemented cross-component
tools help track down why the element erwise be inevitable and would observ- tracing inspired by our system to man-
is leaking? ably impact browser performance. age its memory.
The tracing infrastructure needed More interestingly, though, we are
for cross-component garbage collec- not aware of other sophisticated sys-
Related articles
tion can be applied to improve mem- tems integrating VMs that provide on queue.acm.org
ory debugging. Chrome DevTools cross-component memory manage-
Idle-Time Garbage-Collection Scheduling
uses the infrastructure to capture and ment. While VMs often provide bridg-
Ulan Degenbaev et al.
visualize the object graph spanning es for integration in other systems, https://queue.acm.org/detail.cfm?id=2977741
JavaScript and DOM objects. The tool such as Java Native Interface (JNI) and
Real-time Garbage Collection
allows Web developers to query why NativeScript, cross-component refer- David F. Bacon
a particular object is not reclaimed ences require manual management in https://queue.acm.org/detail.cfm?id=1217268
by the garbage collector. It presents all of them. Developers using those sys- Leaking Space
the answer in the form of a retaining tems must manually create and destroy Neil Mitchell
path, which runs from the object to links that can form cycles. This is error https://queue.acm.org/detail.cfm?id=2538488
the garbage-collection root. Figure 5 prone and can lead to the aforemen-
shows the retaining path for the leak- tioned problems. References
1. Bacon, D.F. and Rajan, V.T. Concurrent cycle collection
ing loadingBar element. The path in reference counted systems. In Proceedings of the
shows that the leaking DOM element Conclusion 15th European Conf. Object-Oriented Programming.
Springer-Verlag, London, U.K., 2001, 207–235; https://
is captured by the loadingBar vari- Cross-component tracing is a way to doi.org/10.1007/3-540-45337-7_12.
able in the environment (called con- solve the problem of reference cycles 2. Chambers, C., Ungar, D. and Lee, E. An efficient
implementation of SELF, a dynamically-typed
text in V8) of an anonymous closure, across component boundaries. This object-oriented language based on prototypes.
which is retained by the internal- problem appears as soon as compo- In Proceedings of the Conf. Object-Oriented
Programming Systems, Languages and Applications.
State field of the fetchContent nents can form arbitrary object graphs ACM SIGPLAN, 1989, 49–70; https://dl.acm.org/
function. By inspecting each node of with nontrivial ownership across API citation.cfm?doid=74877.74884.
3. Degenbaev, U. et al. Cross-component
the path, the Web developer can pin- boundaries. An incremental version of garbage collection. In Proceedings of the
point the source of the leak. Thanks CCT is implemented in V8 and Blink, ACM on Programming Languages 2, OOPSLA
Article 151, 2018; https://dl.acm.org/citation.
to the cross-component tracing, the enabling effective and efficient rec- cfm?doid=3288538.3276521.
path seamlessly crosses the DOM and lamation of memory in a safe man- 4. Degenbaev, U., Filippov, A., Lippautz, M. and Payer, H.
Tracing from JS to the DOM and back again. V8, 2018;
JavaScript boundary.4 ner—without introducing dangling https://v8.dev/blog/tracing-js-dom.
5. Degenbaev, U., Lippautz, M. and Payer, H. Concurrent
pointers that could lead to program marking in V8. V8, 2018; https://v8.dev/blog/
Reclaiming Memory in Other crashes or security vulnerabilities in concurrent-marking.
6. Dijkstra, E.W., Lamport, L., Martin, A.J., Scholten, C.S.
Heterogeneous Systems Chrome or Chromium-derived brows- and Steffens, E.F.M. On-the-fly garbage collection:
Web browsers are particularly inter- ers. The same tracing system is reused An exercise in cooperation. Commun. ACM 21, 11
(Nov. 1978), 966–975; https://dl.acm.org/citation.
esting systems, as all major browser by Chrome DevTools to visualize re- cfm?doid=359642.359655.
engines separate DOM and JavaScript taining paths of objects independent 7. Hablich, M. and Payer, H. Lessons learned from the
memory roadshow; https://bit.ly/2018-memory-
objects in a similar way (that is, by of whether they are managed in C++ or roadshow.
providing different heaps for those JavaScript. 8. Jones, R., Hosking, A. and Moss, E. The Garbage
Collection Handbook: The Art of Automatic Memory
objects). Similar to Blink and V8, all Note, however, that CCT comes with Management. Chapman & Hall, 2012.
those browsers encode their DOM in significant implementation overhead,
C++ and must rely on a custom object as it requires implementations of trac- Ulan Degenbaev is a software engineer at Google, working
on the garbage collector of the V8 JavaScript engine.
model for JavaScript. All Blink-de- ers in each component. Ultimately,
rived systems (for example, Chrome, implementers need to weigh the effort Michael Lippautz is a software engineer at Google, where
he works on garbage collection for the V8 JavaScript
Opera, and Electron) rely on CCT to of either avoiding cycles by enforcing virtual machine and the Blink rendering engine. Previously,
handle cross-component references. restrictions on their systems or imple- he worked on Google’s Dart virtual machine.

The Gecko rendering engine that menting a mechanism to reclaim cy- Hannes Payer is a software engineer at Google, where he
works on the V8 JavaScript virtual machine. Previously, he
powers Firefox uses reference count- cles, such as CCT. Chrome was already worked on Google’s Dart virtual machine and various Java
ing to manage DOM objects. An ad- equipped with garbage collectors in V8 virtual machines.

ditional incremental cycle collector1 and Blink, and thus we chose to imple-
that wakes up periodically ensures ment a generic solution such as CCT Copyright held by authors/owners.

JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 41
practice
DOI:10.1145/ 3316778
˲˲ Build safety. Create an environment

Article development led by
queue.acm.org
where people feel safe and secure.
˲˲ Share vulnerability. When people
are willing to take risks, it can drive co-
Build safety, share vulnerability, operation and build trust.
and establish purpose. ˲˲ Establish purpose. The team should
align around common goals and val-
BY KATE MATSUDAIRA ues, with a clear path forward.
The book is filled with many exam-

How to Create
ples and ideas, but in my experience,
I have seen that what works for one
team will not work for another. That is
one of the reasons leadership is com-

a Great Team
plex and difficult.
You are always working with differ-
ent variables—different teams, differ-
ent companies, different goals. And

Culture
yet team culture is one part of the job
that great leaders never ignore. So, how
do the best leaders create team culture
wherever they go?

(and Why
See the Role You Play
in Team Culture
As a leader, it is your responsibility to
set the culture for the team. I am sure

It Matters)
you have heard the phrase “lead by ex-
ample,” and that is because when peo-
ple aren’t sure what is acceptable, they
look to their leaders for guidance.
You have surely been in the situa-
tion where you have seen your man-
ager staying late at the office, and as
a result, you might have stayed just a
I N MY CAREER leading teams, I have worked with large little longer. On the other hand, if you
organizations (more than 1,000 people) and super- frequently saw your boss taking two-
small teams (a startup with just two people). I have hour lunches, you might not be in such
a hurry to get back to the office when
seen that the best teams have one thing in common: your friend stops by to go to lunch.
a strong team culture. Every day, people are looking for
signals in their environment about
We all know what it is like to be a part of a great what is the norm. As a leader, it is part
team—when you enjoy coming together and the of your job to set the example for those
energy is electric. There is something special that around you.
You want to create a culture where
happens when the team becomes greater than the people are engaged, cooperative, and
sum of the individuals. excited. To do this, you need to be de-
liberate in your actions. For example, if
I was really inspired by this topic recently when you want to create a culture of psycho-
I read Daniel Coyle’s book The Culture Code.1 The logical safety, where people can speak
author shares a lot of research (and I do love data) up and take risks, it is important that
you do not accept or participate in neg-
about what makes a great team. He boils it down to ativity. Research has shown that one
a few key elements: bad apple or toxic employee can bring

42 COMM UNICATIO NS O F THE ACM | J U NE 201 9 | VO L . 62 | NO. 6


down the whole team.2 As a result, if you managing a project, I would talk about There are many more strategies,
see or hear someone acting in opposi- how that person could improve and but the key is to create opportunities
tion to the attitude and environment come up with a plan for the complainer where people can connect. Help build
you are trying to create, you should do to help that other person succeed. trust among team members by allowing
your best to diffuse the situation and At that time, I was reacting to the them to resolve their own conflicts with-
address the individual quickly. problems I saw, but in retrospect, I out you being the mediator. Over time,
If you ignore your team culture or realized my actions ended up creating this will create a group of people who
think it’s not an important part of your a cohesive team—one where people can trust one another, which, in turn,
job, some type of culture will still devel- were encouraged to help one another, will create a cohesive team culture.
op. That is just what happens when hu- and if they disagreed, would always try
mans work together and share a space. to sort it out themselves. Define Your Culture
The leader’s job is to cultivate the type A leader should help create mutual by Knowing Its Value
of culture that will lead to success. vulnerability among team members. Defining a team culture is not an easy
This can be done in a few ways: job. Once a culture has been estab-
Creating Connections ˲˲ Force collaboration. Have team lished, it is easy to see its signposts. But
I was once on a leadership team that members work together to solve a when you are new to a team or work-
had a very negative dynamic—the man- problem or complete a project. ing to develop a positive team culture,
ager had a tendency to play favorites, ˲˲ Encourage people to talk to one an- it can be challenging to know exactly
so many of my peers were always trying other directly. Foster an environment of what it takes to build a culture.
to win favor by badmouthing or under- peer coaching and resist being a proxy Before you start picking rituals and
mining the others. for critical feedback. values you think “sound” good, start by
I remember being very frustrated ˲˲ Reward and recognize collaboration. going back to the very beginning. Ask
that this was allowed to happen and Drive for shared outcomes that cele- yourself: What is the value of this team?
decided that whenever someone came brate team successes. Really think about it. Why does this
IMAGE BY ANTON JA NKOVOY

to me to complain about another team ˲˲ Build trust. Create opportunities team exist? Think beyond the function
member, instead of sharing that feed- (for example, summits or meetings) for that the team serves for the organiza-
back, I would coach the complainer people to build trust with one another tion as a whole (for example, coding or
to tell the other person directly. If the and get to know each another as peo- design). Why are we a team? The answer
complaint was about how a peer was ple—not just as coworkers. to this question is not always clear.

JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 43
practice

Other questions to consider: What Team Structure Becomes How can you make people feel like
are the advantages of being a team? Team Culture they are valued and important parts of
What can we do because we are work- Team culture is not just wearing the the team?
ing together? Maybe it is so team same t-shirts at the company picnic.
members can learn from each other The way you do things every day is what Let the Culture Expand
and do better work; there is more builds your culture. From the Top Down
learning when there is collaboration. So, while streamlining the way your As leader of the team, you have sig-
Maybe it is decision making; we make team formats their reports might not nificant influence over your team’s
smarter decisions when we have mul- feel like it has much to do with team culture. You can institute policies and
tiple viewpoints. Maybe it is about culture, it does. It is a way of steering procedures that help make your team
resources; we can do more with com- your team to work together, by prioritiz- happy and productive, monitor team
bined skills and time. ing the values that your team supports. successes, and continually improve
Culture is in the everyday. It is the the team.
How to Create Cultural small actions that you and everyone on Another important part of team cul-
Touchpoints Around Your Values your team takes on a daily basis—the ture, however, is helping people feel
Once you know the value of your way they speak to each other, the way de- they are a part of creating it. How can
team, then you can start building in cisions get made, the way they run meet- you expand the job of creating a culture
elements that support its values. ings—that make up your team culture. to other team members?
Let’s say you decide that one of the I have seen many amazing examples Look for opportunities to delegate
values of your team is peer mentor- of culture-building throughout my ca- whenever you can. If a holiday is com-
ing; in other words, one of the rea- reer. Here are just a few more ideas that ing up, maybe you could ask a team
sons your team exists is so that its might inspire you as you build your member to help organize a team din-
members can do better work by learn- team culture: ner. Look for people with unique per-
ing from one another. Now, how do ˲˲ Weekly demo meetings. Have some- spectives (who maybe aren’t heard
you make that happen on your team? one from the team share a recent ac- from as often as others) and give them
You cannot just say “We learn complishment. This could be a big a platform to share.
from each other” during a meeting thing, or even something as small as This is where truly great leadership
and make it happen. You have to in- changing a button color on the web- comes from. You establish a culture
stitute processes that make this a site. This creates a culture of sharing that enables your team to be the best it
simple part of daily life on your team. work so that people feel more collabor- can be, and then you allow the team to
Think about which kinds of forums ative even outside the meeting setting, take that culture and run with it.
you can set up to help people learn since they know what other people are How amazing could your team be
from one another. Do you want to working on. with just a few adjustments?
encourage questions in team chat? ˲ ˲ Teaching slots for every team
Set up a code-review process? Estab- member. At every team meeting, have
Related articles
lish a cadence of brown bags to share people sign up to share something on queue.acm.org
lessons learned? Read whitepapers they’ve learned recently or teach some-
High-Performance Team
and discuss them as a group? What thing to the team. This is a great way for
Philip Beevers
makes sense for your team? people who have been to conferences https://queue.acm.org/detail.cfm?id=1117402
Now let’s say that another value of or read interesting books to share that
Culture Surprises in Remote Software
your team is decision-making; your knowledge, and it helps give everyone on Development Teams
team exists because everyone’s in- the team a voice (even those who don’t Judith S. Olson and Gary M. Olson
put helps to make smarter decisions normally speak up during meetings). https://queue.acm.org/detail.cfm?id=966804
about what to work on and how to ˲˲ Cupcakes for launches. Mark every
Stand and Deliver: Why I Hate Stand-Up
work on it. We all know that the more team win by bringing people togeth- Meetings
people are involved in a decision, the er—literally together, around a plate Phillip A. Laplante
more complicated it becomes to get of cupcakes, instead of just via email. https://queue.acm.org/detail.cfm?id=957730
a clear answer. So, how do you ben- This creates a culture of celebration,
References
efit from getting everyone’s input where people’s successes are noticed 1. Coyle, D. The Culture Code. Bantam, 2018; http://
without becoming a team that can and rewarded, and where the whole danielcoyle.com/the-culture-code/.
2. Felps, W., Mitchell, T., and Byington, E. How, when,
get nothing done because no one team celebrates together. and why bad apples spoil the barrel: Negative group
can agree? Some of the decisions you make members and dysfunctional groups. Research in
Organizational Behavior 27 (2006), 175–222.
The solution might be to have all will feel big and some will feel small.
decision-making done in the same But whether it’s as huge as develop- Kate Matsudaira (katemats.com) is an experienced
way. For example, you might institute a ing an online help tool for your team technology leader. She has worked at Microsoft and
Amazon and successful startups before starting her own
format for all reports. That way, the in- or as small as sharing cupcakes after a company, Popforms, which was acquired by Safari Books.
put from various individuals or depart- big win, the effect is the same. Culture
ments will all come to you in the exact comes from shared experiences. The
same way, so you can quickly parse the what doesn’t matter nearly as much as Copyright held by author/owner.
information and make a decision. the why. Publication rights licensed to ACM.

44 COMM UNICATIO NS O F THE AC M | J U NE 201 9 | VO L . 62 | NO. 6


DOI:10.1145 / 3 3 1 6 774


Article development led by
queue.acm.org

Some ML papers suffer from flaws that could


mislead the public and stymie future research.
BY ZACHARY C. LIPTON AND JACOB STEINHARDT

Research for
Practice:
Troubling
Trends in
Machine-Learning
Scholarship
COLLECTIVELY, MACHINE LEARNING
(ML) researchers are engaged in the
creation and dissemination
of knowledge about data-driven
algorithms. In a given paper,
researchers might aspire to any subset of the following
goals, among others: to theoretically characterize what
is learnable; to obtain understanding stronger conclusions supported by
through empirically rigorous experi- evidence; describe empirical inves-
ments; or to build a working system tigations that consider and rule out
that has high predictive accuracy. alternative hypotheses; make clear
While determining which knowledge the relationship between theoretical
warrants inquiry may be subjec- analysis and intuitive or empirical
tive, once the topic is fixed, papers claims; and use language to empower
are most valuable to the community the reader, choosing terminology to
when they act in service of the reader, avoid misleading or unproven conno-
creating foundational knowledge and tations, collisions with other defini-
communicating as clearly as possible. tions, or conflation with other related
What sorts of papers best serve their but distinct concepts.
readers? Ideally, papers should ac- Recent progress in machine learn-
complish the following: provide intu- ing comes despite frequent depar-
ition to aid the reader’s understand- tures from these ideals. This install-
ing but clearly distinguish it from ment of Research for Practice focuses

JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 45
practice

on the following four patterns that ap- imental standards have eroded trust exploration predicated on intuitions
pear to be trending in ML scholarship: in the discipline’s authority.33 The that have yet to coalesce into crisp
˲˲ Failure to distinguish between ex- current strength of machine learn- formal representations. Speculation
planation and speculation. ing owes to a large body of rigorous is a way for authors to impart intu-
˲˲ Failure to identify the sources of research to date, both theoretical and itions that may not yet withstand
empirical gains (for example, empha- empirical. By promoting clear scien- the full weight of scientific scrutiny.
sizing unnecessary modifications to tific thinking and communication, Papers often offer speculation in the
neural architectures when gains actu- our community can sustain the trust guise of explanations, however, which
ally stem from hyperparameter tuning). and investment it currently enjoys. are then interpreted as authoritative
˲˲ “Mathiness”—the use of math- Disclaimers. This article aims to in- because of the trappings of a scientif-
ematics that obfuscates or impresses stigate discussion, answering a call for ic paper and the presumed expertise
rather than clarifies (for example, by papers from the International Confer- of the authors.
confusing technical and nontechnical ence on Machine Learning (ICML) For instance, in a 2015 paper, Ioffe
concepts). Machine Learning Debates workshop. and Szegedy18 form an intuitive theory
˲˲ Misuse of language (for example, While we stand by the points repre- around a concept called internal co-
by choosing terms of art with collo- sented here, we do not purport to of- variate shift. The exposition on inter-
quial connotations or by overloading fer a full or balanced viewpoint or to nal covariate shift, starting from the
established technical terms). discuss the overall quality of science abstract, appears to state technical
While the causes of these patterns in ML. In many aspects, such as re- facts. Key terms are not made crisp
are uncertain, possibilities include producibility, the community has ad- enough, however, to assume a truth
the rapid expansion of the communi- vanced standards far beyond what suf- value conclusively. For example, the
ty, the consequent thinness of the re- ficed a decade ago. paper states that batch normaliza-
viewer pool, and the often-misaligned Note that these arguments are made tion offers improvements by reducing
incentives between scholarship and by us, against us—insiders offering changes in the distribution of hidden
short-term measures of success (for a critical introspective look—not as activations over the course of train-
example, bibliometrics, attention, sniping outsiders. The ills identified ing. By which divergence measure is
and entrepreneurial opportunity). here are not specific to any individual this change quantified? The paper
While each pattern offers a corre- or institution. We have fallen into these never clarifies, and some work sug-
sponding remedy (don’t do it), this patterns ourselves, and likely will again gests that this explanation of batch
article also makes suggestions on how in the future. Exhibiting one of these normalization may be off the mark.37
the community might combat these patterns doesn’t make a paper bad, nor Nevertheless, the speculative expla-
troubling trends. does it indict the paper’s authors; how- nation given by Ioffe and Szegedy has
As the impact of machine learn- ever, all papers could be made stronger been repeated as fact—for example, in
ing widens, and the audience for re- by avoiding these patterns. a 2015 paper by Noh, Hong, and Han,31
search papers increasingly includes While we provide concrete ex- which states, “It is well known that a
students, journalists, and policy-mak- amples, our guiding principles are to deep neural network is very hard to
ers, these considerations apply to this implicate ourselves; and to select pref- optimize due to the internal-covari-
wider audience as well. By communi- erentially from the work of better-es- ate-shift problem.”
cating more precise information with tablished researchers and institutions We have been equally guilty of spec-
greater clarity, better ML scholarship that we admire, to avoid singling out ulation disguised as explanation. In a
could accelerate the pace of research, junior students for whom inclusion 2017 paper with Koh and Liang,42 I (Ja-
reduce the on-boarding time for new in this discussion might have conse- cob Steinhardt) wrote that “the high
researchers, and play a more construc- quences and who lack the opportunity dimensionality and abundance of ir-
tive role in public discourse. to reply symmetrically. We are grateful relevant features … give the attacker
Flawed scholarship threatens to to belong to a community that pro- more room to construct attacks,”
mislead the public and stymie future vides sufficient intellectual freedom without conducting any experiments
research by compromising ML’s in- to allow the expression of critical per- to measure the effect of dimensional-
tellectual foundations. Indeed, many spectives. ity on attackability. In another paper
of these problems have recurred cy- with Liang from 2015,41 I (Steinhardt)
clically throughout the history of AI Troubling Trends introduced the intuitive notion of
(artificial intelligence) and, more Each subsection that follows de- coverage without defining it, and
broadly, in scientific research. I n scribes a trend; provides several exam- used it as a form of explanation (for
1976, Drew McDermott26 chastised ples (as well as positive examples that example, “Recall that one symptom
the AI community for abandoning resist the trend); and explains the con- of a lack of coverage is poor esti-
self-discipline, warning prophetically sequences. Pointing to weaknesses in mates of uncertainty and the inabil-
“if we can’t criticize ourselves, some- individual papers can be a sensitive ity to generate high-precision predic-
one else will save us the trouble.” Sim- topic. To minimize this, the examples tions.” Looking back, we desired to
ilar discussions recurred throughout are short and specific. communicate insufficiently fleshed-
the 1980s, 1990s, and 2000s. In other Explanation vs. speculation. Re- out intuitions that were material to
fields, such as psychology, poor exper- search into new areas often involves the work described in the paper and

46 COMMUNICATIO NS O F TH E AC M | J U NE 201 9 | VO L . 62 | NO. 6


practice

were reticent to label a core part of the provements), when in fact they did not
argument as speculative. do enough (by not performing proper
In contrast to these examples, ablations). Moreover, this practice
Srivastava et al.39 separate specula- misleads readers to believe that all of
tion from fact. While this 2014 paper,
which introduced dropout regulariza- Empirical the proposed changes are necessary.
In 2018, Melis, Dyer, and Blunsom27
tion, speculates at length on connec-
tions between dropout and sexual
study aimed at demonstrated that a series of pub-
lished improvements in language
reproduction, a designated “Motiva- understanding modeling, originally attributed to
tion” section clearly quarantines this
discussion. This practice avoids con-
can be illuminating complex innovations in network ar-
chitectures, were actually the result
fusing readers while allowing authors even absent a of better hyperparameter tuning. On
to express informal ideas.
In another positive example, Yo-
new algorithm. equal footing, vanilla long short-term
memory (LSTM) networks, hardly
shua Bengio2 presents practical guide- modified since 1997, topped the
lines for training neural networks. leaderboard. The community might
Here, the author carefully conveys have benefited more by learning the
uncertainty. Instead of presenting details of the hyperparameter tun-
the guidelines as authoritative, the ing without the distractions. Similar
paper states: “Although such recom- evaluation issues have been observed
mendations come … from years of for deep reinforcement learning17 and
experimentation and to some extent generative adversarial networks.24 See
mathematical justification, they Sculley et al.38 for more discussion of
should be challenged. They consti- lapses in empirical rigor and result-
tute a good starting point … but very ing consequences.
often have not been formally validat- In contrast, many papers perform
ed, leaving open many questions that good ablation analyses, and even
can be answered either by theoretical retrospective attempts to isolate the
analysis or by solid comparative exper- source of gains can lead to new dis-
imental work.” coveries. Furthermore, ablation is nei-
Failure to identify the sources of ther necessary nor sufficient for un-
empirical gains. The ML peer-review derstanding a method, and can even
process places a premium on techni- be impractical given computational
cal novelty. Perhaps to satisfy review- constraints. Understanding can also
ers, many papers emphasize both come from robustness checks (as in
complex models (addressed here) and Cotterell et al.,9 which discovers that
fancy mathematics (to be discussed in existing language models handle in-
“Mathiness” section). While complex flectional morphology poorly), as well
models are sometimes justified, em- as qualitative error analysis.
pirical advances often come about in Empirical study aimed at under-
other ways: through clever problem standing can be illuminating even
formulations, scientific experiments, absent a new algorithm. For example,
optimization heuristics, data-prepro- probing the behavior of neural net-
cessing techniques, extensive hyper- works led to identifying their suscep-
parameter tuning, or applying exist- tibility to adversarial perturbations.44
ing methods to interesting new tasks. Careful study also often reveals limi-
Sometimes a number of proposed tations of challenge datasets while
techniques together achieve a signifi- yielding stronger baselines. A 2016
cant empirical result. In these cases, paper by Chen, Bolton, and Manning6
it serves the reader to elucidate which studied a task designed for reading
techniques are necessary to realize the comprehension of news passages and
reported gains. found that 73% of the questions can
Too frequently, authors propose be answered by looking at a single sen-
many tweaks absent proper ablation tence, while only 2% required looking
studies, obscuring the source of em- at multiple sentences (the remaining
pirical gains. Sometimes, just one of 25% of examples were either ambigu-
the changes is actually responsible for ous or contained coreference errors).
the improved results. This can give the In addition, simpler neural networks
false impression that the authors did and linear classifiers outperformed
more work (by proposing several im- complicated neural architectures that

JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 47
practice

had previously been evaluated on this denced by the paper introducing the
task. In the same spirit, Zellers et al.45 Adam optimizer.19 In the course of
analyzed and constructed a strong introducing an optimizer with strong
baseline for the Visual Genome Scene empirical performance, it also offers a
Graphs dataset in their 2018 paper.
Mathiness. When writing a paper When mathematical theorem regarding convergence in the
convex case, which is perhaps unnec-
early in my Ph.D. program, I (Zach-
ary Lipton) received feedback from an
and natural- essary in an applied paper focusing on
non-convex optimization. The proof
experienced post-doc that the paper language was later shown to be incorrect.35
needed more equations. The post-
doc wasn’t endorsing the system but
statements are A second mathiness issue is put-
ting forth claims that are neither
rather communicating a sober view mixed without a clearly formal nor clearly informal.
of how reviewing works. More equa-
tions, even when difficult to decipher,
clear accounting of For example, Dauphin et al.11 argued
that the difficulty in optimizing neu-
tend to convince reviewers of a paper’s their relationship, ral networks stems not from local
technical depth.
Mathematics is an essential tool both the prose minima but from saddle points. As
one piece of evidence, the work cites
for scientific communication, impart- and the theory a statistical physics paper by Bray and
ing precision and clarity when used
correctly. Not all ideas and claims are can suffer. Dean5 on Gaussian random fields and
states that in high dimensions “all
amenable to precise mathematical local minima [of Gaussian random
description, however, and natural lan- fields] are likely to have an error very
guage is an equally indispensable tool close to that of the global minimum.”
for communicating, especially about (A similar statement appears in the
intuitive or empirical claims. related work of Choromanska et al.7)
When mathematical and natural- This appears to be a formal claim,
language statements are mixed with- but absent a specific theorem it is dif-
out a clear accounting of their relation- ficult to verify the claimed result or
ship, both the prose and the theory can to determine its precise content. Our
suffer: problems in the theory can be understanding is that it is partially a
concealed by vague definitions, while numerical claim that the gap is small
weak arguments in the prose can be for typical settings of the problem pa-
bolstered by the appearance of techni- rameters, as opposed to a claim that
cal depth. We refer to this tangling of the gap vanishes in high dimensions.
formal and informal claims as mathi- A formal statement would help clarify
ness, following economist Paul Romer, this. Note that the broader interesting
who described the pattern like this: point in Dauphin et al. that minima
“Like mathematical theory, mathiness tend to have lower loss than saddle
uses a mixture of words and symbols, points is more clearly stated and em-
but instead of making tight links, it pirically tested.
leaves ample room for slippage be- Finally, some papers invoke theory
tween statements in natural language in overly broad ways or make passing
versus formal language.”36 references to theorems with dubious
Mathiness manifests in several pertinence. For example, the no-free-
ways. First, some papers abuse math- lunch theorem is commonly invoked as
ematics to convey technical depth—to a justification for using heuristic meth-
bulldoze rather than to clarify. Spuri- ods without guarantees, even though
ous theorems are common culprits, the theorem does not formally preclude
inserted into papers to lend authori- guaranteed learning procedures.
tativeness to empirical results, even While the best remedy for mathi-
when the theorem’s conclusions do ness is to avoid it, some papers go
not actually support the main claims further with exemplary exposition. A
of the paper. I (Steinhardt) was guilty 2013 paper by Bottou et al.4 on coun-
of this in a 2015 paper with Percy Li- terfactual reasoning covered a large
ang,40 where a discussion of “staged amount of mathematical ground in a
strong Doeblin chains” had limited down-to-earth manner, with numer-
relevance to the proposed learning ous clear connections to applied em-
algorithm but might confer a sense of pirical problems. This tutorial, writ-
theoretical depth to readers. ten in clear service to the reader, has
The ubiquity of this issue is evi- helped to spur work in the burgeoning

48 COMMUNICATIO NS O F TH E AC M | J U NE 201 9 | VO L . 62 | NO. 6


practice

community studying counterfactual to better decisions,” as explained by works, however, generative model
reasoning for ML. Dave Gershgorn,13 despite demon- imprecisely refers to any model that
Misuse of language. There are three strations that these networks rely on produces realistic-looking structured
common avenues of language misuse spurious correlations, (for example, data. On the surface, this may seem
in machine learning: suggestive defi- misclassifying “Asians dressed in red” consistent with the p(x) definition, but
nitions, overloaded terminology, and as ping-pong balls, reported by Stock it obscures several shortcomings—for
suitcase words. and Cisse43). example, the inability of GANs (gen-
Suggestive definitions. In the first av- Deep-learning papers are not the erative adversarial networks) or VAEs
enue, a new technical term is coined sole offenders; misuse of language (variational autoencoders) to perform
that has a suggestive colloquial mean- plagues many subfields of ML. Lip- conditional inference (for example,
ing, thus sneaking in connotations ton, Chouldechova, and McAuley23 sampling from p(x2x1) where x1 and
without the need to argue for them. discuss how the recent literature on x2 are two distinct input features).
This often manifests in anthropomor- fairness in ML often overloads termi- Bending the term further, some dis-
phic characterizations of tasks (read- nology borrowed from complex legal criminative models are now referred
ing comprehension and music compo- doctrine, such as disparate impact, to to as generative models on account of
sition) and techniques (curiosity and name simple equations expressing producing structured outputs, a mis-
fear—I (Zachary) am responsible for particular notions of statistical par- take that I (Lipton), too, have made.
the latter). A number of papers name ity. This has resulted in a literature Seeking to resolve the confusion and
components of proposed models in where “fairness,” “opportunity,” and provide historical context, Mohamed
a manner suggestive of human cog- “discrimination” denote simple sta- and Lakshminarayanan30 distinguish
nition (for example, thought vectors tistics of predictive models, confusing between prescribed and implicit gen-
and the consciousness prior). Our goal researchers who become oblivious to erative models.
is not to rid the academic literature the difference and policymakers who Revisiting batch normalization,
of all such language; when properly become misinformed about the ease Ioffe and Szegedy18 described covari-
qualified, these connections might of incorporating ethical desiderata ate shift as a change in the distribution
communicate a fruitful source of in- into ML. of model inputs. In fact, covariate shift
spiration. When a suggestive term is Overloading technical terminology. refers to a specific type of shift, where
assigned technical meaning, however, A second avenue of language misuse although the input distribution p(x)
each subsequent paper has no choice consists of taking a term that holds might change, the labeling function
but to confuse its readers, either by precise technical meaning and using p(yx) does not. Moreover, as a result
embracing the term or by replacing it. it in an imprecise or contradictory of the influence of Ioffe and Szegedy,
Describing empirical results with way. Consider the case of deconvolu- Google Scholar lists batch normaliza-
loose claims of “human-level” perfor- tion, which formally describes the tion as the first reference on searches
mance can also portray a false sense process of reversing a convolution, for “covariate shift.”
of current capabilities. Take, for ex- but is now used in the deep-learning Among the consequences of misus-
ample, the “dermatologist-level clas- literature to refer to transpose convo- ing language is the possibility (as with
sification of skin cancer” reported lutions (also called upconvolutions) as generative models) of concealing lack
in a 2017 paper by Esteva et al.12 The commonly found in auto-encoders of progress by redefining an unsolved
comparison with dermatologists con- and generative adversarial networks. task to refer to something easier. This
cealed the fact that classifiers and der- This term first took root in deep often combines with suggestive defi-
matologists perform fundamentally learning in a paper that does address nitions via anthropomorphic naming.
different tasks. Real dermatologists deconvolution but was later overgen- Language understanding and reading
encounter a wide variety of circum- eralized to refer to any neural archi- comprehension, once grand challenges
stances and must perform their jobs tecture using upconvolutions. Such of AI, now refer to making accurate
despite unpredictable changes. The overloading of terminology can create predictions on specific datasets.
machine classifier, however, achiev- lasting confusion. New ML papers re- Suitcase words. Finally, ML pa-
eed low error only on independent, ferring to deconvolution might be in- pers tend to overuse suitcase words.
identically distributed (IID) test data. voking its original meaning, describ- Coined by Marvin Minsky in the 2007
In contrast, claims of human-level ing upconvolution, or attempting to book The Emotion Machine,29 suitcase
performance in work by He et al.16 are resolve the confusion, as in a paper by words pack together a variety of mean-
better qualified to refer to the Ima- Hazirbas, Leal-Taixé, and Cremers,15 ings. Minsky described mental pro-
geNet classification task (rather than which awkwardly refers to “upconvo- cesses such as consciousness, think-
object recognition more broadly). lution (deconvolution).” ing, attention, emotion, and feeling
Even in this case, one careful paper As another example, generative that may not share “a single cause or
(among many less careful) was insuffi- models are traditionally models of origin.” Many terms in ML fall into
cient to put the public discourse back either the input distribution p(x) or this category. For example, I (Lipton)
on track. Popular articles continue to the joint distribution p(x,y). In con- noted in a 2016 paper that interpret-
characterize modern image classifiers trast, discriminative models address ability holds no universally agreed-
as “surpassing human abilities and ef- the conditional distribution p(yx) of upon meaning and often references
fectively proving that bigger data leads the label given the inputs. In recent disjoint methods and desiderata.22 As

JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 49
practice

a consequence, even papers that ap- censed to insert arbitrary unsupported in a submitted paper.
pear to be in dialogue with each other stories (see “Explanation vs. Specula- Misaligned incentives. Reviewers are
may have different concepts in mind. tion”) regarding the factors driving the not alone in providing poor incentives
As another example, generaliza- results; to omit experiments aimed at for authors. As ML research garners
tion has both a specific technical disentangling those factors (see “Fail- increased media attention and ML
meaning (generalizing from train- ure to Identify the Sources of Empiri- startups become commonplace, to
ing to testing) and a more colloquial cal Gains”); to adopt exaggerated ter- some degree incentives are provided
meaning that is closer to the notion minology (see “Misuse of Language”); by the press (“What will they write
of transfer (generalizing from one or to take less care to avoid mathiness about?”) and by investors (“What will
population to another) or of exter- (see “Mathiness”). they invest in?”). The media provides
nal validity (generalizing from an ex- At the same time, the single-round incentives for some of these trends.
perimental setting to the real world). nature of the reviewing process may Anthropomorphic descriptions
Conflating these notions leads to cause reviewers to feel they have no of ML algorithms provide fodder for
overestimating the capabilities of choice but to accept papers with popular coverage. Take, for example, a
current systems. strong quantitative findings. Indeed, 2014 article by Cade Metz in Wired,28
Suggestive definitions and over- even if the paper is rejected, there is that characterized an autoencoder as
loaded terminology can contribute to no guarantee the flaws will be fixed or a “simulated brain.” Hints of human-
the creation of new suitcase words. even noticed in the next cycle, so re- level performance tend to be sensa-
In the fairness literature, where le- viewers may conclude that accepting a tionalized in newspaper coverage—
gal, philosophical, and statistical flawed paper is the best option. for example, an article in the New York
language are often overloaded, terms Growing pains. Since around 2012, Times by John Markoff described a
such as bias become suitcase words the ML community has expanded rap- deep-learning image-captioning sys-
that must be subsequently unpacked. idly because of increased popularity tem as “mimicking human levels of
In common speech and as aspira- stemming from the success of deep- understanding.”25
tional terms, suitcase words can serve learning methods. While the rapid Investors, too, have shown a strong
a useful purpose. Sometimes a suit- expansion of the community can be appetite for AI research, funding start-
case word might reflect an overarch- seen as a positive development, it can ups sometimes on the basis of a sin-
ing aspiration that unites the vari- also have side effects. gle paper. In my (Lipton) experience
ous meanings. For example, artificial To protect junior authors, we have working with investors, they are some-
intelligence might be well suited as preferentially referenced our own times attracted to startups whose re-
an aspirational name to organize an papers and those of established re- search has received media coverage,
academic department. On the other searchers. And certainly, experienced a dynamic that attaches financial in-
hand, using suitcase words in techni- researchers exhibit these patterns. centives to media attention. Note that
cal arguments can lead to confusion. Newer researchers, however, may be recent interest in chatbot startups
For example, in his 2017 book, Super- even more susceptible. For example, co-occurred with anthropomorphic
intelligence,3 Nick Bostrom wrote an authors unaware of previous termi- descriptions of dialogue systems and
equation (Box 4) involving the terms nology are more likely to misuse or re- reinforcement learners both in papers
intelligence and optimization power, define language (as discussed earlier). and in the media, although it may be
implicitly assuming these suitcase Rapid growth can also thin the re- difficult to determine whether the
words can be quantified with a one- viewer pool in two ways: by increas- lapses in scholarship caused the inter-
dimensional scalar. ing the ratio of submitted papers est of investors or vice versa.
to reviewers and by decreasing the Suggestions. Suppose we are to inter-
Speculation on Causes fraction of experienced reviewers. vene to counter these trends, then how?
Behind the Trends Less-experienced reviewers may Besides merely suggesting that each au-
Do the patterns mentioned here rep- be more likely to demand architec- thor abstain from these patterns, what
resent a trend, and if so, what are the tural novelty, be fooled by spurious can we do as a community to raise the
underlying causes? We speculate that theorems, and let pass serious but level of experimental practice, exposi-
these patterns are on the rise and sus- subtle issues such as misuse of tion, and theory? And how can we more
pect several possible causal factors: language, thus either incentivizing readily distill the knowledge of the com-
complacency in the face of progress, or enabling several of the trends munity and disabuse researchers and
the rapid expansion of the commu- described here. At the same time, the wider public of misconceptions?
nity, the consequent thinness of the experienced but overburdened re- What follows are a number of prelimi-
reviewer pool, and misaligned incen- viewers may revert to a “checklist” nary suggestions based on personal ex-
tives of scholarship vs. short-term mentality, rewarding more formu- periences and impressions.
measures of success. laic papers at the expense of more
Complacency in the face of progress. creative or intellectually ambitious For Authors, Publishers,
The apparent rapid progress in ML has work that might not fit a preconceived and Reviewers
at times engendered an attitude that template. Moreover, overworked re- We encourage authors to ask “What
strong results excuse weak arguments. viewers may not have enough time to worked?” and “Why?” rather than just
Authors with strong results may feel li- fix—or even to notice—all of the issues “How well?” Except in extraordinary

50 COMMUNICATIO NS O F TH E AC M | J U NE 201 9 | VO L . 62 | NO. 6


practice

cases, raw headline numbers provide paper if the authors had done a worse
limited value for scientific progress job?” For example, a paper describing a
absent insight into what drives them. simple idea that leads to improved per-
Insight does not necessarily mean formance, together with two negative
theory. Three practices that are com-
mon in the strongest empirical papers Investors have results, should be judged more favor-
ably than a paper that combines three
are error analysis, ablation studies,
and robustness checks (for example,
shown a strong ideas together (without ablation stud-
ies) yielding the same improvement.
choice of hyperparameters, as well as appetite for AI Current literature moves fast at the
ideally the choice of dataset). Every-
one can adopt these practices, and
research, funding expense of accepting flawed works for
conference publication. One remedy
we advocate their widespread use. startups sometimes could be to emphasize authoritative
For some exemplar papers, consider
the preceding discussion in “Failure
on the basis of a retrospective surveys that strip out
exaggerated claims and extraneous
to Identify the Sources of Empirical single paper. material, change anthropomorphic
Gains.” Langley and Kibler21 also pro- names to sober alternatives, standard-
vide a more detailed survey of empiri- ize notation, and so on. While venues
cal best practices. such as Foundations and Trends in Ma-
Sound empirical inquiry need not chine Learning, a journal from Now
be confined to tracing the sources of a Publishers in Hanover, MA, already
particular algorithm’s empirical gains; provide a track for such work, there
it can yield new insights even when no are still not enough strong papers in
new algorithm is proposed. Notable this genre.
examples of this include a demonstra- Additionally, we believe (noting our
tion that neural networks trained by conflict of interest) that critical writ-
stochastic gradient descent can fit ran- ing ought to have a voice at ML confer-
domly assigned labels.46 This paper ences. Typical ML conference papers
questions the ability of learning-the- choose an established problem (or
oretic notions of model complexity to propose a new one), demonstrate an
explain why neural networks can gen- algorithm and/or analysis, and report
eralize to unseen data. In another ex- experimental results. While many
ample, Goodfellow, Vinyals, and Saxe14 questions can be approached in this
explored the loss surfaces of deep net- way, when addressing the validity of
works, revealing that straight-line paths the problems or the methods of in-
in parameter space between initialized quiry themselves, neither algorithms
and learned parameters typically have nor experiments are sufficient (or ap-
monotonically decreasing loss. propriate). We would not be alone in
When researchers are writing their embracing greater critical discourse:
papers, we recommend they ask the in natural language processing (NLP),
following question: Would I rely on this this year’s Conference on Computa-
explanation for making predictions or for tional Linguistics (COLING) included
getting a system to work? This can be a a call for position papers “to challenge
good test of whether a theorem is being conventional thinking.”
included to please reviewers or to con- There are many lines of further
vey actual insight. It also helps check discussion worth pursuing regard-
whether concepts and explanations ing peer review. Are the problems de-
match the researcher’s own internal scribed here mitigated or exacerbated
mental model. On mathematical writ- by open review? How do reviewer point
ing, we point the reader to Knuth, Lar- systems align with the values that we
rabee, and Roberts’s excellent guide- advocate? These topics warrant their
book, Mathematical Writing.20 own papers and have indeed been dis-
Finally, being clear about which cussed at length elsewhere.
problems are open and which are Discussion. Folk wisdom might
solved not only presents a clearer pic- suggest not to intervene just as the
ture to readers, but also encourages field is heating up—you can’t argue
follow-up work and guards against with success! We counter these objec-
researchers neglecting questions pre- tions with the following arguments:
sumed (falsely) to be resolved. First, many aspects of the current cul-
Reviewers can set better incentives ture are consequences of ML’s recent
by asking: “Might I have accepted this success, not its causes. In fact, many of

JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 51
practice

the papers leading to the current suc- and the experiments so computation-
cess of deep learning were careful em- ally expensive to run that waiting for
pirical investigations characterizing ablations to complete might not have
principles for training deep networks. been worth the cost to the community.
This includes the advantage of ran-
dom over sequential hyperparameter Greater rigor in A related concern is that high stan-
dards might impede the publication of
search, the behavior of different acti-
vation functions, and an understand-
exposition, science, original ideas, which are more likely to be
unusual and speculative. In other fields,
ing of unsupervised pretraining. and theory are such as economics, high standards re-
Second, flawed scholarship already
negatively impacts the research com-
essential for both sult in a publishing process that can take
years for a single paper, with lengthy re-
munity and broader public discourse. scientific progress vision cycles consuming resources that
The “Troubling Trends” section of this
article gives examples of unsupported
and fostering could be deployed toward new work.
Finally, perhaps there is value
claims being cited thousands of times, productive in specialization: The researchers
lineages of purported improvements
being overturned by simple baselines, discourse with generating new conceptual ideas or
building new systems need not be the
datasets that appear to test high-level the broader public. same ones who carefully collate and
semantic reasoning but actually test distill knowledge.
low-level syntactic fluency, and termi- These are valid considerations, and
nology confusion that muddles the ac- the standards we are putting forth
ademic dialogue. This final issue also here are at times exacting. In many
affects public discourse. For example, cases, however, they are straightfor-
the European Parliament passed a re- ward to implement, requiring only
port considering regulations to apply a few extra days of experiments and
if “robots become or are made self- more careful writing. Moreover, they
aware.”10 While ML researchers are are being presented as strong heuris-
not responsible for all misrepresenta- tics rather than unbreakable rules—
tions of our work, it seems likely that if an idea cannot be shared without
anthropomorphic language in author- violating these heuristics, the idea
itative peer-reviewed papers is at least should be shared and the heuristics
partly to blame. set aside.
Greater rigor in exposition, sci- We have almost always found at-
ence, and theory are essential for both tempts to adhere to these standards to
scientific progress and fostering pro- be well worth the effort. In short, the
ductive discourse with the broader research community has not achieved
public. Moreover, as practitioners a Pareto optimal state on the growth-
apply ML in critical domains such as quality frontier.
health, law, and autonomous driving,
a calibrated awareness of the abilities Historical Antecedents
and limits of ML systems will help us The issues discussed here are unique
to deploy ML responsibly. neither to machine learning nor to this
moment in time; they instead reflect
Countervailing Considerations issues that recur cyclically through-
There are a number of countervailing out academia. As far back as 1964, the
considerations to the suggestions set physicist John R. Platt34 discussed relat-
forth in this article. Several readers of ed concerns in his paper on strong in-
earlier drafts of this paper noted that ference, where he identified adherence
stochastic gradient descent tends to con- to specific empirical standards as re-
verge faster than gradient descent—in sponsible for the rapid progress of mo-
other words, perhaps a faster, noisier lecular biology and high-energy physics
process that ignores our guidelines for relative to other areas of science.
producing “cleaner” papers results in There have been similar discus-
a faster pace of research. For example, sions in AI. As noted in the introduc-
the breakthrough paper on ImageNet tion to this article, McDermott26 criti-
classification proposes multiple tech- cized a (mostly pre-ML) AI community
niques without ablation studies, sev- in 1976 on a number of issues, includ-
eral of which were subsequently deter- ing suggestive definitions and a fail-
mined to be unnecessary. At the time, ure to separate out speculation from
however, the results were so significant technical claims. In 1988, Cohen and

52 COMM UNICATIO NS O F THE AC M | J U NE 201 9 | VO L . 62 | NO. 6


practice

Howe8 addressed an AI community 2. Bengio, Y. Practical recommendations for gradient- artificial brain. Wired (Sept. 26, 2014); https://www.
based training of deep architectures. Neural Networks: wired.com/2014/09/google-artificial-brain/.
that at that point “rarely publish[ed] Tricks of the Trade. G. Montavon, G.B. Orr, KR Müller, 29. Minsky, M. The Emotion Machine: Commonsense
performance evaluations” of their eds. LNCS 7700 (2012). Springer, Berlin, Heidelberg, Thinking, Artificial Intelligence, and the Future of the
437–78. Human Mind. Simon & Schuster, New York, NY, 2006.
proposed algorithms and instead 3. Bostrom, N. Superintelligence. Dunod, Paris, France, 2017. 30. Mohamed, S., Lakshminarayanan, B. Learning in
only described the systems. They sug- 4. Bottou, L. et al. Counterfactual reasoning and learning implicit generative models. arXiv Preprint, 2016;
systems: The example of computational advertising. J. arXiv:1610.03483.
gested establishing sensible metrics Machine Learning Research 14, 1 (2013), 3207–3260. 31. Noh, H., Hong, S. and Han, B. Learning deconvolution
for quantifying progress, and analyz- 5. Bray, A.J. and Dean, D.S. Statistics of critical points of network for semantic segmentation. In Proceedings of
Gaussian fields on large-dimensional spaces. Physical the Intern. Conf. Computer Vision, 2015, 1520–1528.
ing the following: “Why does it work?” Review Letters 98, 15 (2007), 150201; https://journals. 32. Nye, M.J. N-rays: An episode in the history and
“Under what circumstances won’t it aps.org/prl/abstract/10.1103/PhysRevLett.98.150201. psychology of science. Historical Studies in the
6. Chen, D., Bolton, J. and Manning, C.D. A thorough Physical Sciences 11, 1 (1980), 125–56.
work?” and “Have the design deci- examination of the CNN/Daily Mail reading 33. Open Science Collaboration. Estimating the
comprehension task. In Proceedings of the 54th reproducibility of psychological science. Science 349,
sions been justified?”—questions that 6251 (2015), aac4716.
Annual Meeting of Assoc. Computational Linguistics,
continue to resonate today. 2016, 2358–2367. 34. Platt, J.R. Strong inference. Science 146, 3642 (1964),
7. Choromanska, A., Henaff, M., Mathieu, M., Arous, G.B., 347–353.
Finally, in 2009 Armstrong et al.1 LeCun, Y. The loss surfaces of multilayer networks. 35. Reddi, S.J., Kale, S. and Kumar, S. On the convergence
discussed the empirical rigor of in- In Proceedings of the 18th Intern. Conf. Artificial of Adam and beyond. In Proceedings of the Intern.
Intelligence and Statistics, 2015. Conf. Learning Representations, 2018.
formation-retrieval research, noting a 8. Cohen, P.R., Howe, A.E. How evaluation guides AI 36. Romer, P.M. Mathiness in the theory of economic
tendency of papers to compare against research: the message still counts more than the growth. Amer. Econ. Rev. 105, 5 (2015), 89–93.
medium. AI Magazine 9, 4 (1988), 35. 37. Santurkar, S., Tsipras, D., Ilyas, A. and Madry, A. How
the same weak baselines, producing a 9. Cotterell, R., Mielke, S.J., Eisner, J. and Roark, B. Are does batch normalization help optimization? (No, it is
long series of improvements that did all languages equally hard to language-model? In not about internal covariate shift). In Proceedings of
Proceedings of Conf. North American Chapt. Assoc. the 32nd Conf. Neural Information Processing Systems;
not accumulate to meaningful gains. Computational Linguistics: Human Language T 2018; https://papers.nips.cc/paper/7515-how-does-
In other fields, an unchecked de- echnologies, Vol. 2, 2018. batch-normalization-help-optimization.pdf.
10. Council of the European Union. Motion for a European 38. Sculley, D., Snoek, J., Wiltschko, A. and Rahimi, A.
cline in scholarship has led to crisis. Parliament Resolution with Recommendations to the Winner’s curse? On pace, progress, and empirical
A landmark study in 2015 suggested a Commission on Civil Law Rules on Robotics, 2016; rigor. In Proceedings of the 6th Intern. Conf. Learning
https://bit.ly/285CBjM. Representations, Workshop Track, 2018
significant portion of findings in the 11. Dauphin, Y.N. et al. Identifying and attacking the 39. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever,
psychology literature may not be re- saddle point problem in high-dimensional non- I. and Salakhutdinov, R. Dropout: A simple way to
convex optimization. Advances in Neural Information prevent neural networks from overfitting. J. Machine
producible.33 In a few historical cases, Processing Systems, 2014, 2933–2941. Learning Research 15, 1 (2014), 1929–1958; https://
dl.acm.org/citation.cfm?id=2670313.
enthusiasm paired with undisciplined 12. Esteva, A. et al. Dermatologist-level classification of
40. Steinhardt, J. and Liang, P. Learning fast-mixing
skin cancer with deep neural networks. Nature 542,
scholarship led entire communities 7639 (2017), 115-118. models for structured prediction. In Proceedings of
the 32nd Intern. Conf. Machine Learning 37 (2015),
down blind alleys. For example, fol- 13. Gershgorn, D. The data that transformed AI
1063–1072; http://proceedings.mlr.press/v37/
research—and possibly the world. Quartz, 2017;
lowing the discovery of X-rays, a re- https://bit.ly/2uwyb8R. steinhardtb15.html.
14. Goodfellow, I.J., Vinyals, O. and Saxe, A.M. 41. Steinhardt, J. and Liang, P. Reified context models. In
lated discipline on N-rays emerged Proceedings of the 32nd Intern. Conf. Machine Learning
Qualitatively characterizing neural network
before it was eventually debunked.32 optimization problems. In Proceedings of the Intern. 37, (2015), 1043–1052; https://dl.acm.org/citation.
Conf. Learning Representations, 2015. cfm?id=3045230.
15. Hazirbas, C., Leal-Taixé, L. and Cremers, D. Deep depth 42. Steinhardt, J., Koh, P.W. and Liang, P.S. Certified
Concluding Remarks from focus. arXiv Preprint, 2017; arXiv:1704.01085. defenses for data poisoning attacks. In Proceedings of
16. He, K., Zhang, X., Ren, S. and Sun, J. Delving deep into the 31st Conf. Neural Information Processing Systems,
The reader might rightly suggest these rectifiers: Surpassing human-level performance on 2017; https://papers.nips.cc/paper/6943-certified-
problems are self-correcting. We ImageNet classification. In Proceedings of the IEEE defenses-for-data-poisoning-attacks.pdf.
Intern. Conf. Computer Vision, 2015, 1026–1034. 43. Stock, P. and Cisse, M. ConvNets and ImageNet
agree. However, the community self- 17. Henderson, P. et al. Deep reinforcement learning beyond accuracy: Explanations, bias detection,
corrects precisely through recurring that matters. In Proceedings of the 32nd Assoc. adversarial examples and model criticism. arXiv
Advancement of Artificial Intelligence Conf., 2018. Preprint, 2017, arXiv:1711.11443.
debate about what constitutes reason- 18. Ioffe, S. and Szegedy, C. Batch normalization: 44. Szegedy, C. et al. Intriguing properties of neural
able standards for scholarship. We accelerating deep network training by reducing networks. Intern. Conf. Learning Representations.
internal covariate shift. In Proceedings of the 32nd arXiv Preprint, 2013, arXiv:1312.6199.
hope that this paper contributes con- Intern. Conf. Machine Learning 37, 2015; http:// 45. Zellers, R., Yatskar, M., Thomson, S. and Choi, Y. Neural
motifs: Scene graph parsing with global context. In
structively to the discussion. proceedings.mlr.press/v37/ioffe15.pdf.
Proceedings of the IEEE Conf. Computer Vision and
19. Kingma, D.P. and Ba, J. Adam: A method for stochastic
optimization. In Proceedings of the 3rd Intern. Conf. Pattern Recognition, 2018, 5831–5840.
46. Zhang, C., Bengio, S., Hardt, M., Recht, B. and Vinyals,
Acknowledgments Learning Representations, 2015
O. Understanding deep learning requires rethinking
20. Knuth, D.E., Larrabee, T. and Roberts, P.M.
We thank Asya Bergal, Kyunghyun Mathematical writing, 1987; https://bit.ly/2TmxyNq generalization. In Proceedings of the Intern. Conf.
Learning Representations, 2017.
Cho, Moustapha Cisse, Daniel Dewey, 21. Langley, P. and Kibler, D. The experimental study of
machine learning, 1991; http://www.isle.org/~langley/
Danny Hernandez, Charles Elkan, Ian papers/mlexp.ps.
Zachary C. Lipton is an assistant professor at Carnegie
Goodfellow, Moritz Hardt, Tatsunori 22. Lipton, Z.C. The mythos of model interpretability.
Mellon University in the Tepper School of Business with
Intern. Conf. Machine Learning Workshop on Human
Hashimoto, Sergey Ioffe, Sham Ka- Interpretability, 2016.
appointments in the Machine Learning Department and
the Heinz School of Public Policy. He also collaborates
kade, David Kale, Holden Karnofsky, 23. Lipton, Z.C., Chouldechova, A. and McAuley, J. Does
with Amazon, where he helped to grow AWS’ Amazon
mitigating ML’s impact disparity require treatment
Pang Wei Koh, Lisha Li, Percy Liang, disparity? Advances in Neural Inform. Process. Syst.
AI team and contributed to the Apache MXNet deep
learning framework. Find him at zacklipton.com,
Julian McAuley, Robert Nishihara, 2017, 8136-8146. arXiv Preprint arXiv:1711.07076.
Twitter @zacharylipton, or GitHub @zackchase.
24. Lucic, M., Kurach, K., Michalski, M., Gelly, S., Bousquet,
Noah Smith, Balakrishnan “Murali” O. Are GANs created equal? A large-scale study. In Jacob Steinhardt will be joining UC Berkeley as an
Narayanaswamy, Ali Rahimi, Christo- Proceedings of the 32nd Conf. Neural Information assistant professor of statistics. He is a technical advisor
Processing Syst. arXiv Preprint 2017; arXiv:1711.10337. for the Open Philanthropy Project and has collaborated
pher Ré, and Byron Wallace. We also 25. Markoff, J. Researchers announce advance in image- with policy researchers to understand and avoid potential
thank the ICML Debates organizers. recognition software. NYT (Nov. 17, 2014); https://nyti. misuses of machine learning.
ms/2HfcmSe.
26. McDermott, D. Artificial intelligence meets natural
References stupidity. ACM SIGART Bulletin 57 (1976), 4–9.
1. Armstrong, T.G., Moffat, A., Webber, W. and Zobel, J. 27. Melis, G., Dyer, C. and Blunsom, P. On the state of
Improvements that don’t add up: ad-hoc retrieval the art of evaluation in neural language models.
results since 1998. In Proceedings of the 18th ACM In Proceedings of the Intern. Conf. Learning
Conf. Information and Knowledge Management, 2009, Representations, 2018. Copyright held by owners/authors.
601–610. 28. Metz, C. You don’t have to be Google to build an Publication rights licensed to ACM.

JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 53
contributed articles
DOI:10.1145/ 3286588
months and application-demanded
Programmable software-defined requirements from the storage sys-
tem grow quickly over time. This
solid-state drives can move computing notable lag in the adaptability and
functions closer to storage. velocity of movement of the storage
infrastructure may ultimately affect
BY JAEYOUNG DO, SUDIPTA SENGUPTA, AND STEVEN SWANSON the ability to innovate throughout the
cloud world.

Programmable
In this article, we advocate creating
a software-defined storage substrate of
solid-state drives (SSDs) that are as pro-

Solid-State
grammable, agile, and flexible as the
applications/OS accessing from serv-
ers in cloud datacenters. A fully pro-
grammable storage substrate prom-

Storage in
ises opportunities to better bridge the
gap between application/OS needs and
storage capabilities/limitations, while

Future Cloud
allowing application developers to in-
novate in-house at cloud speed.
The move toward software-defined

Datacenters
control for IO devices and co-proces-
sors has played out before in the data-
center. Both GPUs and network inter-
face cards (NICs) started as black-box
devices that provide acceleration for
CPU-intensive operations (such as
graphics and packet processing). In-
ternally, they implemented accelera-
tion features with a combination of
specialized hardware and proprietary
THERE IS A major disconnect today in cloud datacenters firmware. As customers demanded
concerning the speed of innovation between greater flexibility, vendors slowly ex-
posed programmability to the rest of
application/operating system (OS) and storage the system, unleashing the vast pro-
infrastructures. Application/OS software is patched cessing power available from GPUs
with new/improved functionality every few weeks at and a new level of agility in how sys-
tems can manage networks for en-
“cloud speed,” while storage devices are off-limits hanced functionality like more granu-
for such sustained innovation during their hardware lar traffic management, security, and
life cycle of three to five years in datacenters. Since
key insights
the software inside the storage device is written by
˽˽ A fully programmable storage substrate
storage vendors as proprietary firmware not open in cloud datacenters opens up new
opportunities to innovate the storage
for general application developers to modify, the infrastructure at cloud speed.
developers are stuck with a device whose functionality ˽˽ In-storage programming is becoming
increasingly easier with powerful
and capabilities are frozen in time, even as many of processing capabilities and highly flexible
them are modifiable in software. A period of five years is development environments.

almost eternal in the cloud computing industry where ˽˽ New value propositions with the
programmable storage substrate can be
new features, platforms, and application program realized, such as customizing the storage
interface, moving compute close to data,
interfaces (APIs) are evolving every couple of and performing secure computations.

54 COM MUNICATIO NS O F TH E ACM | J U NE 201 9 | VO L . 62 | NO. 6


deep-network telemetry. Such large-scale datasets—which gen- The controller that is most com-
Storage is at the cusp of a similar erally range from tens of terabytes to monly implemented as a system-on-
transformation. Modern SSDs rely multiple petabytes—present chal- a-chip (SoC) is designed to manage
on sophisticated processing engines lenges of extreme scale while achieving the underlying storage media. For ex-
running complex firmware, and ven- very fast and efficient data processing: ample, SSDs built using NAND flash
dors already provide customized firm- a high-performance storage infrastruc- memory have unique characteristics
ware builds for cloud operators. Ex- ture in terms of throughput and latency in that data can be written only to an
posing this programmability through is necessary. This trend has resulted in empty memory location—no in-place
IMAGE BY AND RIJ BORYS ASSOCIAT ES, USING PH OTO F ROM SH UTT ERSTOC K.COM

easily accessible interfaces will let growing interest in the aggressive use updates are allowed—and memory
storage systems in the cloud data- of SSDs that, compared with tradition- can endure only a limited number
centers adapt to rapidly changing re- al spinning hard disk drives (HDDs), of writes before it can no longer be
quirements on the fly. provides orders-of-magnitude lower read. Therefore, the controller must
latency and higher throughput. In ad- be able to perform some background
Storage Trends dition to these performance benefits, management tasks (such as garbage
The amount of data being generated the advent of new technologies (such collection) to reclaim flash blocks
daily is growing exponentially, placing as 3D NAND enabling much denser containing invalid data to create
more and more processing demand on chips and quad-level-cell, or QLC, for available space and wear leveling to
datacenters. According to a 2017 mar- bulk storage) allows SSDs to continue evenly distribute writes across the
keting-trend report from IBM,a 90% of to significantly scale in capacity and to entire flash blocks with the purpose
the data in the world in 2016 has been yield a huge reduction in price. of extending the SSD life. These tasks
created in the last 12 months of 2015. There are two key components in are, in general, implemented by pro-
SSDs,4 as shown in Figure 1—an SSD prietary firmware running on one or
a https://ibm.co/2XNvHPk controller and flash storage media. more embedded processor cores in

JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 55
contributed articles

Figure 1. Internal architecture of a modern flash SSD. of 16 or 32 flash channels, as out-


lined in Figure 2. Since each flash
Flash SSD
channel can keep up with ~500MB/
sec; internally each SSD can be up to
Flash Storage Media ~500MB/sec per channel X 32 chan-
Flash Channel nels = ~16GB/sec (see Figure 2d); and
Embedded Flash
SRAM the total aggregated in-SSD perfor-
Host Interface

Processor Controller
Controller

mance would be ~16GB/sec per SSD X


64 SSDs = ~1TB/sec (see Figure 2c), a
SSD DRAM Flash Channel
Flash 66x gap. Making SSDs programmable
Controller Controller Controller
would thus allow systems to fully le-
verage this abundant bandwidth.
DRAM

In-Storage Programming
Modern SSDs combine processing—
embedded processor—and storage
Figure 2. Example conventional storage server architecture with multiple NVMe SSDs. components—SRAM, DRAM, and flash
memory—to carry out routine func-
SSD Storage System
tions required for managing the SSD.
These computing resources present in-
0
Flash SSD 0
teresting opportunities to run general
0

PCIe 1
user-defined programs. In 2013, Do et
Switch Flash SSD
1 al.6,17 explored such opportunities for
2
Flash SSD the first time in the context of running
1
PCIe 3
Flash SSD 31
selected database operations inside
Switch
a Samsung SAS flash SSD. They wrote
Root complex
DRAM

simple selection and aggregation oper-


DRAM

CPU
0
Flash SSD 0 ators that were compiled into the SSD
15
1
Flash SSD 1
firmware and extended the execution
PCIe 2 framework of Microsoft SQL Server
Switch Flash SSD

3
2012 to develop a working prototype
Flash SSD 31
in which simple selection and aggrega-
(a) (b) (c) (d)
tion queries could be run end-to-end.
Throughput gap of 8x
64 SSDs X ~2 GB/s
= ~128 GB/s
That work demonstrated several
16 lanes of PCIe times improvement in performance
= ~16 GB/s

Throughput gap of 66x


64 SSDs X ~16 GB/s
= ~1 TB/s
32 channels X ~500 MB/s
= ~16 GB/s
and energy efficiency by offloading
database operations onto the SSD and
highlighted a number of challenges
that would need to be overcome to
the controller. In enterprise SSDs, large storage server at low cost (compared broadly adapt programmable SSDs:
SRAM is often used for executing the to building a specialized server to First, the computing capabilities
SSD firmware, and both user data and directly attach all SSDs on the moth- available inside the SSD are limited by
internal SSD metadata are cached in erboard of the host), the maximum design. The low-performance embed-
external DRAM. throughput is limited to 16-lane ded processor inside the SSD with-
Interestingly, SSDs generally have PCIe interface speed (see Figure 2a), out L1/L2 caches and high latency to
a far larger aggregate internal band- which is approximately 16GB/sec, the in-SSD DRAM require extra care-
width than the bandwidth supported regardless of the number of SSDs ful programming to run user code in
by host I/O interfaces (such as SAS and accessed in parallel. There is thus the SSD without producing a perfor-
PCIe). Figure 2 outlines an example an 8x throughput gap between the mance bottleneck.
of a conventional storage system that host interface and the total aggre- Moreover, the embedded software-
leverages a plurality of NVM Express gated SSD bandwidth that could be development process is complex and
(NVMe)b SSDs; 64 of them are con- up to roughly ~2GB/sec per SSDc X 64 makes programming and debugging
nected to 16 PCIe switches that are SSDs = ~128GB/sec (see Figure 2b). very challenging. To maximize perfor-
mounted to a host machine via 16 More interestingly, this gap would mance, Do et al. had to carefully plan
lanes of PCIe Gen3. While this stor- grow further if the internal SSD per- the layout of data structures used by the
age architecture provides a com- formance is considered. A modern code running inside the SSD to avoid
modity solution for high-capacity enterprise-level SSD usually consists spilling out of the SRAM. Likewise, Do
et al. used a hardware-debugging tool
b A device interface for accessing non-volatile c Practical sequential-read bandwidth of a com- to debug programs running inside the
memory attached via a PCI Express (PCIe) bus. modity PCIe SSD. SSD that is far more primitive than reg-

56 COMM UNICATIO NS O F THE ACM | J U NE 201 9 | VO L . 62 | NO. 6


contributed articles

ular debugging tools (such as Micro- available inside the SSD are increasingly software-hardware innovation inside
soft Visual Studio) available to general powerful, with abundant compute the SSD. Moreover, going beyond the
application developers. Worse, the de- and bandwidth resources. Emerging packaged SSD, because the two major
vice-side processing code—selection SSDs include software-programmable components inside the SSD are each
and aggregation—had to be compiled controllers with multi-core proces- manufactured by multiple vendors,d it
into the SSD firmware in the prototype, sors, built-in hardware accelerators is conceivable that SSDs could be cus-
meaning application developers would to offload compute-intensive tasks tom designed and provided in partner-
need to worry about not only the target from the processors, multiple GBs of ship with component vendorse (just
application itself but also complex in- DRAM, and tens of independent chan- like how today’s datacenter servers are
ternal structures and algorithms in the nels to the underlying storage media, built and deployed), and even contrib-
SSD firmware. allowing several GB/s of internal data ute back some of the designs to the
On top of this, the consequences throughput. Even more interesting community (via forums like the Open
of an error can be quite severe, which and useful, programming SSDs is be- Compute project, https://www.open-
could result in corrupted data or an coming easier, with the trend away compute.org). For example, the indus-
unusable drive. Workaday application from proprietary architectures and try is already moving in this direction
programmers are unlikely to accept software runtimes and toward com- with introduction of the Open-Channel
the additional complexity, and cloud modity operating systems (such as SSD technology2,8,f that moves much of
providers are unlikely to let untrusted Linux) running on top of general- the SSD firmware functionalities out of
code run in such a fragile environment. purpose processors (such as ARM and the black box and into the operating
Application developers need a flex- RISC-V). This trend enables general system or userspace, giving applica-
ible and general programming model application developers to fully lever- tions better control over the device. In
that allows easily running user code age existing tools, libraries, and exper- an open source project called Denalig
written in a high-level programming tise, allowing them to focus on their in 2018, Microsoft proposed a scheme
language (such as C/C++) inside an own core competencies rather than
SSD. The programming model must spending many hours getting used to d Several vendors manufacture each type of
also support the concurrent execution the low-level, embedded development component in flash SSDs. For example: flash
of multiple in-SSD applications while process. This also allows application controller manufactured by Marvell, PMC (ac-
ensuring that malicious applications developers to easily port large applica- quired by Microsemi), Sandforce (acquired by
Seagate), Indilinx (acquired by OCZ), and flash
do not adversely affect the overall SSD tions already running on host operat- memory manufactured by Samsung, Toshiba,
operation or violate protection guar- ing systems to the device with mini- and Micron.
antees provided by the operating and mal code changes. e Many large-scale datacenter operators (such
file system. All in all, the programmability evo- as Google19 and Baidu16) build their own SSDs
In 2014, Seshadri et al.20 proposed lution in SSDs presents a unique op- that are fully optimized for their own applica-
tion requirements.
Willow, an SSD that made program- portunity to embrace the SSDs as a f The Linux Open-Channel SSD subsystem was
mability a central feature of the SSD first-class programmable platform introduced in the Linux kernel version 4.4.
interface, allowing ordinary developers in the cloud datacenters, enabling g https://bit.ly/2GCuIum
to safely augment and extend the SSD
semantics with application-specific Figure 3. Disruptive trends in the flash storage industry toward abundant resources and
increased ease of programmability inside the SSD.
functions without compromising file
system protections. In their model, host
and in-SSD applications communicate
via PCIe using a simple, generic—not
offload, DRAM, # flash channels and capacity)

Abundant
storage-centric—remote procedure call resources Programmable
(CPU #cores/clock speed, hardware

(RPC) mechanism. In 2016, Gu et al.7 ex- inside SSD SSD


plored a flow-based programming mod-
d
en
el where an in-SSD application can be
constructed from tasks and data pipes
e Tr
connecting the tasks. These program- tiv
ming models provide great flexibility in r up
terms of programmability but are still Dis
far from “general purpose.” There is
a risk that existing large applications Frugal
Today’s
resources SSD
might still need significant redesigns
inside SSD
to exploit each model’s capabilities, re-
quiring much time and effort.
Fortunately, winds of change can Embedded CPU, proprietary General purpose CPU,
firmware OS server-like OS (Linux)
disrupt the industry and help applica-
tion developers explore SSD program- (Ease of programmability inside SSD)
ming in a better way, as illustrated in
Figure 3. The processing capabilities

JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 57
contributed articles

that splits the monolithic components ray (FPGA)13,21 and GPU,h with storage GPU ecosystems have in the past. This
of an SSD into two different modules— media) and flash and other emerging is an opportunity to rethink datacenter
one standardized part dealing with new non-volatile memories (such as 3D architecture with efficient use of het-
storage media and a software interface XPoint, ReRAM, STT-RAM, and PCM) erogeneous, energy-efficient hardware,
to handle application-specific tasks that provide persistent storage at DRAM which is the way forward for higher
(such as garbage collection and wear latencies to deliver high-performance performance at lower power.
leveling). In this way, SSD suppliers can gains. This approach would present the
build simpler products for datacen- greatest flexibility to take advantage of Value Propositions
ters and deliver them to market more advances in the underlying storage de- Here, we summarize three value
quickly while per-application tuning is vice to optimize performance for mul- propositions that demonstrate future
possible by datacenter operators. tiple cloud applications. In the near directions in programmable storage
The component-based ecosystem future, the software-hardware innova- (see Figure 4):
also opens up entirely new opportu- tion inside the SSD can proceed much Agile, flexible storage interface (see
nities for integrating powerful het- like the PC, networking hardware, and Figure 4a). Full programmability will al-
erogeneous programming elements low the storage interface and feature set
(such as field-programmable gate ar- h https://bit.ly/2L8LfM4 to evolve at cloud speed, without having
to persuade standardization bodies to
Figure 4. Programmable SSD value proposition. bless them or persuade device manu-
facturers to implement them in the
next-generation hardware roadmap,
Programmable SSD both usually involving years of delay. A
richer, customizable storage interface
will allow application developers to stay
DRAM focused on their application, without
Host having to work around storage con-
Server Processor + straints, quirks, or peculiarities, thus
HW Offload improving developer productivity.
As an example of the need for such an
interface, consider how stream writes
(a)
Agile, flexible storage interface are handled in the SSD today. Because
leveraging programmability within SSD the SSD cannot differentiate between
incoming data from multiple streams, it
could pack data from different streams
Programmable SSD onto the same flash erase block, the
smallest unit that can be erased from
flash at once. When a portion of the
DRAM
stream data is deleted, it leaves blocks
Host with holes of invalid data. To reclaim
Server these blocks, the garbage-collection ac-
Processor +
HW Offload tivity inside the SSD must copy around
the valid data, slowing the device and
increasing write amplification, thus re-
(b)
Moving compute inside SSD to leverage ducing device lifetime.
low latency, high bandwidth, and access to data If application developers had con-
trol over the software inside the SSD,
they could handle streams much more
Programmable SSD efficiently. For instance, incoming
writes could be tagged with stream
IDs and the device could use this in-
DRAM
formation to fill a block with data
Host from the same stream. When data
Server
Processor + from that stream is deleted, the entire
HW Offload data block could be reclaimed with-
out copying around data. Such stream
awareness has been shown to double
(c) device lifetime, significantly increas-
Trusted domain for secure computation
(cleartext not allowed to egress the SSD boundary.)
ing read performance.14 In Micro-
soft, this need of supporting multiple
streams in the SSD was identified in
2014, but NVMe incorporated the fea-

58 COMM UNICATIO NS O F THE ACM | J U NE 201 9 | VO L . 62 | NO. 6


contributed articles

ture only late 2017.i Moreover, large- analytic query is given, compressed to provide access to these files.10 Secu-
scale deployment in Microsoft data- data required to answer the query is rity is often among the topmost con-
centers might take at least another first loaded to host, uncompressed, cerns enterprise chief information offi-
year and be very expensive, since new and then executed using host resourc- cers have when they move to the cloud,
SSDs must be purchased to essential- es. Such fundamental data analytics as cloud providers are unwilling to take
ly get a new version of the firmware. primitive can be processed inside the on full liability for the impact of such
Waiting five years for a change to a SSD by accessing data with high in- breaches. Development of a secure
system software component is com- ternal bandwidth and by offloading cloud is not just a feature requirement
pletely out of step with how quickly decompression to the dedicated en- but also an absolute foundational ca-
computer systems are evolving today. gine. Subsequent stages of the query- pability necessary for the future of the
A programmable storage platform processing pipeline (such as filtering cloud computing model and its busi-
would reduce this delay to months out unnecessary data and performing ness success as an industry.
and allow rapid iteration and refine- the aggregation) can execute inside To realize the vision of a trusted
ment of the feature, not to mention the SSD, resulting in greatly reduced cloud, data must be encrypted while
the ability to “tweak” the implementa- network traffic and saved host CPU/ stored at rest, which however, limits
tion to match specific use cases. memory resources for other important the kind of computation that can be
Moving compute close to data (see jobs. Further, performance and band- performed on encrypted data with-
Figure 4b). The need to analyze and width together can be scaled by adding out decryption. To facilitate arbitrary
glean intelligence from big data im- more SSDs to the system if the applica- (legitimate) computation on stored
poses a shift from the traditional com- tion requires higher data rates. data, it needs to be decrypted before
pute-centric model to data-centric Secure computation in the cloud computing on it. This requires de-
model. In many big data scenarios, (see Figure 4c). Recent security breach crypted cleartext data to be present
application performance and re- events related to personal, private in- (at least temporarily) in various por-
sponsiveness (demanded by interac- formation (financial and otherwise) tions of the datacenter infrastructure
tive usage) is dominated not by the have exposed the vulnerability of data vulnerable to security attacks. Appli-
execution of arithmetic and logic in- infrastructures to hackers and attackers. cation developers need a way to facili-
structions but instead by the require- Also, a new type of malicious software tate secure computation on the cloud
ment to handle huge volumes of data called “encryption ransomware” at- by fencing in well-defined, narrow,
and the cost of moving this data to the tacks machines by stealthily encrypt- trusted domains that can preserve
location(s) where compute is per- ing data files and demanding a ransom the ability to perform arbitrary com-
formed. When this is the case, moving
the compute closer to the data can Figure 5. A prototype programmable SSD developed for research purposes.
reap huge benefits in terms of in-
creased throughput, lower latency,
and reduced energy usage.
Big data analytics running inside an
SSD can have access to the stored data
with tens of GB/sec bandwidth (rivaling
DRAM bandwidth), and with latency
comparable to accessing raw non-vola-
tile memory. In addition, large energy
savings can be achieved because pro-
cessors inside the SSD are more energy
efficient compared to the host-server
CPU (such as Intel Xeon), and data (a)
Device with a storage board with an embedded storage controller
does not need to be hauled over large and DIMM slots for flash or other forms of NVM
distances from storage all the way up
to the host via network, which is more
energy-expensive than processing it.
Processors inside the SSD are clear-
ly not as powerful as host processors,
but together with in-storage hardware
offload engines, a broad range of data
processing tasks can be competitively
performed inside the SSD. As an ex-
ample, consider how data analytic que-
ries are processed in general: When an
(b)
Device with a storage board where M.2 SSDs can be plugged into.
i Note the multi-stream technology for SCSI/
SAS was standardized in T10 on May 20, 2015.

JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 59
contributed articles

putation on the data. flexible with enterprise-level capa- toward bringing compute to SSDs so
SSDs with their powerful compute bilities and resources. It comprises data can be processed without leaving
capabilities can form a trusted do- a main board and a storage board. the place where it is originally stored.
main for doing secure computation The main board contains an ARMv8 For instance, in 2017 NGD Systemsl
on encrypted data, leveraging their in- processor, 16GB of RAM, and various announced an SSD called Catalina21
ternal hardware cryptographic engine on-chip hardware accelerators (such capable of running applications di-
and secure boot mechanisms for this as 20Gbps compression/decompres- rectly on the device. Catalina2 uses
purpose. Cryptographic keys can be sion, 20Gbps SEC-crypto, and 10Gbps TLC 3D NAND flash (up to 24TB),
stored inside the SSD, allowing arbi- RegEx engines). It also provides which is connected to the onboard
trary compute to be carried out on the NVMe connectivity via four PCIe Gen3 ARM SoC that runs an embedded
stored data—after decryption if need- lanes, and 4x10Gbps Ethernet that Linux and modules for error-correct-
ed—while enforcing that data cannot supports remote dynamic memory ing code (ECC) and FTL. On the host
leave the device in cleartext form. This access (RDMA) over converged ether- server, a tunnel agent (with C/C++
allows a new, flexible, easily program- net (RoCE) protocol. It supports two libraries) runs to talk to the device
mable, near-data realization of trusted different storage boards that connect through the NVMe protocol. As anoth-
hardware in the cloud. Compared to via 2x4 PCIe Gen3 lanes: One type of er example, ScaleFluxm uses a Xilinx
currently proposed solutions like Intel board (see Figure 5a) includes an em- FPGA (combined with terabytes of
Enclavesj that are protected, isolated bedded storage controller and four TLC 3D NAND flash) to compute data
areas of execution in the host server memory slots where flash or other for data-intensive applications. The
memory, this solution protects orders forms of NVM can be installed; and host server runs a software module,
of magnitude more data. the second (see Figure 5b) an adapter providing API accesses to the device
that hosts two M.2 SSDs. while being responsible for FTL and
Programmable SSDs The ARM SoC inside the board runs flash-management functionalities.
While the concept of in-storage pro- a full-fledged Ubuntu Linux, so pro- Academia and industry are work-
cessing on SSDs was proposed more gramming the board is very similar ing to establish a compelling value
than six years ago,6 experimenting with to programming any other Linux de- proposition by demonstrating appli-
SSD programming has been limited vice. For instance, software can lever- cation scenarios for each of the three
by the availability of real hardware on age the Linux container technology pillars outlined in Figure 4. Among
which a prototype can be built to dem- (such as Docker) to provide isolated them we are initially focused on ex-
onstrate what is possible. The recent environments inside the board. To ploring the benefits and challenges
emergence of prototyping boards avail- create applications running on the of moving compute closer to stor-
able for both research and commercial board, a software development kit age (see Figure 4b) in the context of
purposes has opened new opportuni- (SDK) containing GNU tools to build big data analytics, examining large
ties for application developers to take applications for ARM and user/ker- amounts of data to uncover hidden
ideas from conception to action. nel mode libraries to use the on-chip patterns and insights.
Figure 5 shows such prototype hardware accelerators is provided, al- Big data analytics within a program-
device, called Dragon Fire Card lowing a high level of programmabil- mable SSD. To demonstrate our ap-
(DFC),k,3,5 designed and manufac- ity. The DFC can also serve as a block proach, we have implemented a C++
tured by Dell EMC and NXP for re- device, just like regular SSDs. For this reader that runs on a DFC card (see
search. The card is powerful and purpose, the device is shipped with a Figure 5) for Apache Optimized Row
flash translation layer (FTL) that runs Columnar (ORC) files. The ORC file for-
j https://software.intel.com/en-us/sgx on the main board. mat is designed for fast processing and
k https://github.com/DFC-OpenSource The SSD industry is also moving high storage efficiency of big data ana-
lytic workloads, and has been widely
Figure 6. Preliminary results using a programmable SSD yield approximately 5x speedups adopted in the open source community
for full scans of ZLIB-compressed ORC files within the device, compared to native ORC
readers running on x86 architecture.
and industry. The reader running in-
side the SSD reads large chunks of ORC
streams, decompresses them, and then
evaluates query predicates to find only
300
(million rows/second)

272.47 necessary values. Due to the server-like


250
development environment—Ubuntu
Throughput

200 and a general-purpose ARM proces-


150 sor—we easily ported a reference im-
100 plementation of the ORC readern to the
55.4
50 ARM SoC environment (with only a few
0 lines of code changes) and incorporat-
X86 (Intel Xeon @2.3GHz) Programmable SSD
(ARM @1.8GHz + Decompression offload)
l http://www.ngdsystems.com
m http://www.scaleflux.com
n https://github.com/apache/orc

60 COMM UNICATIO NS O F THE ACM | J U NE 201 9 | VO L . 62 | NO. 6


contributed articles

ed library APIs into the reader, enabling node that could be accessed over the
reading data from flash and offloading network through a simple key-value
the decompression work to the ARM store interface provided fault tolerance
SoC hardware accelerator. through replication and application-
Figure 6 shows preliminary band-
width results of scanning a ZLIB-com- The programmable specific processing (such as predicate
evaluations, substring matching and
pressed, single-column integer dataset
(one billion rows) through the C++ ORC
storage substrate decompression) at line rate.

reader running on a host x86 server vs. can be viewed as Datacenter Realization
inside the DFC card, respectively.o As in
the figure, we achieved approximately
a hyper-converged Each application running in cloud
datacenters has its own, unique re-
5x faster scan performance inside the infrastructure quirements, making it difficult to
device compared to running on the
host server. Given that this is a single
where storage, design server nodes with the proper
balance of compute, memory, and
device performance, we should be able networking, storage. To cope with such complex-
to achieve much better performance
improvements by increasing the num- and compute ity, an approach of physically decou-
pling resources was proposed re-
ber of programmable SSDs that are are tightly coupled cently by Han et al.9 in 2013 to allow
used in parallel.
In addition to scanning, filtering, for low-latency, replacing, upgrading, or adding in-
dividual resources instead of the en-
and aggregating large volumes of data
at high-throughput rates by offload-
high-throughput tire node. With the availability of fast
interconnect technologies (such as
ing part of the computation directly to access, while InfiniBand, RDMA, and RoCE), it is al-
the storage has been explored as well.
In 2016 Jo et al.12 built a prototype
still providing ready common in today’s large-scale
cloud datacenters to disaggregate
that performs very early filtering of availability. storage from compute, significantly
data through a combination of ARM reducing the total cost of ownership
and a hardware pattern-matching en- and improving the efficiency of the
gine available inside a programmable storage utilization. However, stor-
SSD equipped with a flow-based pro- age disaggregation is a challenge15
gramming model described by Gu et as storage-media access latencies are
al.7 When a query is given, the query heading toward single-digit microsec-
planner determines whether early ond levelp compared to a disk’s milli-
filtering is beneficial for the query second latency, which is much larger
and chooses a candidate table as the than the fast network overhead. It is
target if the estimated filtering ratio likely that, in the next few years the
is sufficiently high. Early filtering is network latency will become a bottle-
then performed against the target neck as new, emerging non-volatile
table inside the device, and only fil- memories with extremely low laten-
tered data is then fetched to the host cies become available.
for residual computation. This early This challenge of storage disaggre-
filtering inside the device turns out gation can be overcome by using pro-
to be highly effective for analytic grammable storage, enabling a fully
queries; when running all 22 TPC-H programmable storage substrate
queries on a MariaDB server with the that is decoupled from the host sub-
programmable device prototyped on strate as outlined in Figure 7. This
a commodity NVMe SSD, a 3.6x speed- view of storage as a programmable
up was achieved by Jo et al.12 com- substrate allows application devel-
pared to a system with the same SSD opers not only to leverage very low,
without the programmability. storage-medium access latency by
Alternatively, an FPGA-based proto- running programs inside the storage
type design for near-data processing device but also to access any remote
inside the a storage node for database storage device without involving the
engines was studied by István et al.11 in remote host server where the device
2017. In this prototype, each storage is physically attached (see Figure 7) by

o Note, to effectively compare data-processing


capability in each case—Intel Xeon in x86 vs. p For example, the access latency of 3D XPoint
ARM + decompression accelerator in the device— can take 5~10 µsec, while NVMe SSD and disk
only a single core for each processor was used. takes ~50–100 µsec and 10 msec, respectively.8

JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 61
contributed articles

8. Hady, F. Wicked fast storage and beyond. In


Figure 7. Enabling a programmable storage substrate decoupled from the host substrate.
Proceedings of the 7th Non Volatile Memory Workshop
(San Diego, CA, Mar. 6–8). Keynote, 2016.
Direct traffic between programmable storage devices (with a network 9. Han, S., Egi, N., Panda, A., Ratnasamy, S., Shi, G.,
interface) without involving a remote host. and Shenker, S. Network support for resource
disaggregation in next-generation datacenters. In
Proceedings of the 12th ACM Workshop on Hot Topics in
Networks (College Park, MD, Nov. 21–22). ACM Press,
New York, 2013, 10.
10. Huang, J., Xu, J., Xing, X., Liu, P., and Qureshi, M. K.
Programmable Host Flashguard: Leveraging intrinsic flash properties
Host
Substrate to defend against encryption ransomware. In
Proceedings of the 2017 ACM SIGSAC Conference
Host on Computer and Communications Security (Dallas,
Host TX, Oct. 30–Nov. 3). ACM Press, New York, 2017,
2231–2244.
SSD 11. István, Z., Sidler, D., and Alonso, G. Caribou:
Programmable Storage Intelligent distributed storage. In Proceedings of the
Substrate VLDB Endowment 10, 11 (Aug. 2017), 1202–1213.
SSD 12. Jo, I., Bae, D.-H., Yoon, A.S., Kang, J.-U., Cho, S., Lee,
SSD D.D., and Jeong, J. YourSQL: A high-performance
database system leveraging in storage computing.
In Proceedings of the VLDB Endowment 9, 12 (Aug.
NIC 2016), 924–935.
Network
Interconnect 13. Jun, S.-W., Liu, M., Lee, S., Hicks, J., Ankcorn, J., King,
NIC M., Xu, S., et al. BlueDBM: An appliance for big data
NIC
analytics. In Proceedings of the ACM/IEEE 42nd Annual
International Symposium on Computer Architecture
Server Node (Portland, OR, Jun. 13–17). IEEE, 2015, 1–13.
14. Kang, J.-U., Hyun, J., Maeng, H., and Cho, S. The
multi-streamed solid-state drive. In Proceedings of
the 6th USENIX Workshop on Hot Topics in Storage and
File Systems (Philadelphia, PA, Jun. 17–18). USENIX
Association, Berkeley, CA, 2014.
using NVMe over Fabrics (NVMe-oF)q provides opportunities for embracing 15. Klimovic, A., Kozyrakis, C., Thereska, E., John, B.,
with RDMA. them as a first-class programmable and Kumar, S. Flash storage disaggregation. In
Proceedings of the 11th European Conference on
With the programmable storage sub- platform in cloud datacenters, en- Computer Systems (London, U.K., Apr. 18–21). ACM
Press, New York, 2016, 29.
strate, we can think of going beyond the abling software-hardware innovation 16. Ouyang, J., Lin, S., Jiang, S., Hou, Z., Wang, Y., and
single-device block interface. For exam- that could bridge the gap between ap- Wang, Y. SDF: Software-defined flash for web-scale
Internet storage systems. In Proceedings of the 19th
ple, a micro server inside storage can ex- plication/OS needs and storage capa- International Conference on Architectural Support
pose a richer interface like a distributed bilities/limitations. We hope to shed for Programming Languages and Operating Systems
(Salt Lake City, UT, Mar. 1–5). ACM press, New York,
key-value store or distributed streams. light on the future of software-defined 2014, 471–484.
Or the storage infrastructure can be man- storage and help chart a direction for 17. Park, K., Kee, Y.-S., Patel, J.M., Do, J., Park, C., and
Dewitt, D.J. Query processing on smart SSDs. IEEE
aged as a fabric, not as individual devices. designing, building, deploying, and Data Engineering Bulletin 37, 2 (Jun. 2014), 19–26.
The programmable storage substrate can leveraging a software-defined storage 18. Picoli, I.L., Pasco, C.V., Jónsson, B.Þ., Bouganim,
L., and Bonnet, P. uFLIP-OC: Understanding flash
also provide high-level datacenter capa- architecture for cloud datacenters. I/O patterns on open-channel solid state drives.
bilities (such as backup, data snapshot, In Proceedings of the 8th Asia-Pacific Workshop on
Systems (Mumbai, India, Sep. 2–3). ACM Press, New
replication, de-duplications, and tier- Acknowledgments York, 2017, 20.
ing), which are typically supported in a This work was supported in part by 19. Schroeder, B., Lagisetty, R., and Merchant, A. Flash
reliability in production: The expected and the
datacenter server environment where National Science Foundation Award unexpected. In Proceedings of the 14th USENIX
compute and storage are separated. 1629395. Conference on File and Storage Technologies (Santa
Clara, CA, Feb. 22–25). USENIX Association, Berkeley,
This means the programmable storage CA, 2016, 67–80.
20. Seshadri, S., Gahagan, M., Bhaskaran, S., Bunker,
substrate can be viewed as a hyper-con- References T., De, A., Jin, Y., Liu, Y., and Swanson, S. Willow: A
verged infrastructure where storage, net- 1. Alves, V. In-situ processing. Flash Memory Summit user-programmable SSD. In Proceedings of the 11th
(Santa Clara, CA, Aug. 8–10), 2017. USENIX Symposium on Operating Systems Design and
working, and compute are tightly cou- 2. Bjørling, M., González, J., and Bonnet, P. Lightnvm: The Implementation (Broomfield, CO, Oct. 6–8). USENIX
Linux open-channel SSD subsystem. In Proceedings
pled for low-latency, high-throughput of the 15th USENIX Conference on File and Storage
Association, Berkeley, CA, 2014, 67–80.
21. Woods, L., István , Z., and Alonso, G. Ibex: An
access, while still providing availability. Technologies (Santa Clara, CA, Feb. 27–Mar. 2). intelligent storage engine with support for advanced
USENIX Association, Berkeley, CA, 2017, 359–374. SQL offloading. In Proceedings of the VLDB
3. Bonnet, P. What’s up with the storage hierarchy? Endowment 7, 11 (Jul. 2014), 963–974.
Conclusion In Proceedings of the 8th Biennial Conference on
Innovative Data Systems Research (Chaminade, CA,
In this article, we have presented our Jan. 8–11), 2017.
Jaeyoung Do (jaedo@microsoft.com) is a researcher at
vision of a fully programmable stor- 4. Cornwell, M. Anatomy of a solid-state drive. Commun.
Microsoft Research, Redmond, WA, USA. He is leading a
ACM 55, 12 (Dec. 2012), 59–63.
age substrate in cloud datacenters, 5. Do, J. Softflash: Programmable storage in future data
project, SoftFlash, which aims to use programmable SSDs
in cloud datacenters.
allowing application developers to centers. In Proceedings of the 20th SNIA Storage
Developer Conference (Santa Clara, CA, Sep. 11–14), 2017. Sudipta Sengupta (sudipta@amazon.com) is leading
innovate the storage infrastructure 6. Do, J., Kee, Y.-S., Patel, J.M., Park, C., Park, K., and new initiatives in artificial intelligence/deep learning at
at cloud speed like the software ap- DeWitt, D.J. Query processing on smart SSDs: Amazon AWS, Seattle, WA, USA; the research reported in
Opportunities and challenges. In Proceedings of this article was done while he was at Microsoft Research,
plication/OS infrastructure. The the ACM SIGMOD International Conference on Redmond, WA, USA.
programmability evolution in SSDs Management of Data (New York, NY, Jun. 22–27). ACM
Press, New York, 2013, 1221–1230. Steven Swanson (swanson@cs.ucsd.edu) is a professor
7. Gu, B., Yoon, A. S., Bae, D.-H., Jo, I., Lee, J., Yoon, J., in the Department of Computer Science and Engineering
q A technology specification designed for non- Kang, J.-U., Kwon, M., Yoon, C., Cho, S., et al. Biscuit: at the University of California, San Diego, USA.
volatile memories to transfer data between A framework for near data processing of big data
workloads. In Proceedings of the ACM/IEEE 43rd
a host and a target system/device over a net-
Annual International Symposium on Computer
work. Approximately 90% of the NVMe-oF pro- Architecture (Seoul, S. Korea, Jun. 18–22). IEEE, Copyright held by authors/owners.
tocol is the same as the NVMe protocol. 2016, 153-165. Publication rights licensed to ACM. $15.00.

62 COM MUNICATIO NS O F TH E AC M | J U NE 201 9 | VO L . 62 | NO. 6


DOI:10.1145/ 3 2 8 2 48 7

Cybersecurity design reduces the risk


of system failure from cyberattack, aiming
to maximize mission effectiveness.
BY O. SAMI SAYDJARI

Engineering
Trustworthy
Systems:
A Principled
Approach to
Cybersecurity
in frequency, severity,
C YBE RATTAC K S A R E IN C R E A S IN G
and sophistication. Target systems are becoming
increasingly complex with a multitude of subtle
dependencies. Designs and implementations
continue to exhibit flaws that could be avoided with
well-known computer-science and engineering
techniques. Cybersecurity technol- in the hundreds of billions of dollars,4
ogy is advancing, but too slowly to erosion of trust in conducting busi-
keep pace with the threat. In short, ness and collaboration in cyberspace,
cybersecurity is losing the escala- and risk of a series of catastrophic
tion battle with cyberattack. The re- events that could cause crippling
sults include mounting damages damage to companies and even entire
countries. Cyberspace is unsafe and is
key insights becoming less safe every day.
The cybersecurity discipline has
Cybersecurity must be practiced as created useful technology against as-
a principled engineering discipline. pects of the expansive space of pos-
Many principles derive from insight into sible cyberattacks. Through many
the nature of how cyberattacks succeed. real-life engagements between cyber-
Defense in depth and breath is required to attackers and defenders, both sides
cover the spectrum of cyberattack classes. have learned a great deal about how to

JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 63
contributed articles

design attacks and defenses. It is now ernments and ways of life though what
time to begin abstracting and codify- is sometimes known by the military as
ing this knowledge into principles of influence operations{24.09}.6
cybersecurity engineering. Such prin- Before launching into the princi-
ciples offer an opportunity to multiply
the effectiveness of existing technol- Students of ples, one more important point needs
to be made: Engineers are responsible
ogy and mature the discipline so that
new knowledge has a solid foundation
cybersecurity for the safety and security of the sys-
tems they build {19.13}. In a conver-
on which to build. must be students sation with my mentor’s mentor, I
Engineering Trustworthy Systems8
contains 223 principles organized into
of cyberattacks once made the mistake of using the
word customer to refer to those using
25 chapters. This article will address and adversarial the cybersecurity systems we were de-
10 of the most fundamental principles
that span several important categories
behavior. signing. I will always remember him
sharply cutting me off and telling me
and will offer rationale and some guid- that they were “clients, not custom-
ance on application of those principles ers.” He said, “Used-car salesmen
to design. Under each primary princi- have customers; we have clients.”
ple, related principles are also includ- Like doctors and lawyers, engineers
ed as part of the discussion. have a solemn and high moral respon-
For those so inclined to read more in sibility to do the right thing and keep
Engineering Trustworthy Systems, after those who use our systems safe from
each stated principle is a reference of the harm to the maximum extent possi-
form “{x.y}” where x is the chapter num- ble, while informing them of the risks
ber in which it appears and y is the y-th they take when using our systems.
principle listed in that chapter (which In The Thin Book of Naming Ele-
are not explicitly numbered in the book). phants,5 the authors describe how the
National Aeronautics and Space Ad-
Motivation ministration (NASA) shuttle-engineer-
Society has reached a point where it is ing culture slowly and unintentionally
inexorably dependent on trustworthy transmogrified from that adhering to a
systems. Just-in-time manufacturing, policy of “safety first” to “better, faster,
while achieving great efficiencies, cheaper.” This change discouraged
creates great fragility to cyberattack, engineers from telling truth to power,
amplifying risk by allowing effects including estimating the actual proba-
to propagate to multiple systems bility of shuttle-launch failure. Manage-
{01.06}. This means that the potential ment needed the probability of launch
harm from a cyberattack is increasing failure to be less than 1 in 100,000 to
and now poses existential threat to in- allow launch. Any other answer was an
stitutions. Cybersecurity is no longer annoyance and interfered with on-time
the exclusive realm of the geeks and and on-schedule launches. In an inde-
nerds, but now must be considered as pendent assessment, Richard Feyn-
an essential risk to manage alongside man found that when engineers were
other major risks to the existence of allowed to speak freely, they calculated
those institutions. the actual failure probability to be 1 in
The need for trustworthy systems 100.5 The engineering cultural failure
extends well beyond pure technology. killed many great and brave souls in
Virtually everything is a system from two separate shuttle accidents.
some perspective. In particular, essen- I wrote Engineering Trustworthy Sys-
tial societal functions such as the mili- tems and this article to help enable and
tary, law enforcement, courts, societal encourage engineers to take full charge
safety nets, and the election process of explicitly and intentionally manag-
are all systems. People and their beliefs ing system risk, from the ground up,
are systems and form a component of in partnership with management and
larger societal systems, such as voting. other key stakeholders.
In 2016, the world saw cyberattacks
transcend technology targets to that of Principles
wetware—human beliefs and propen- It was no easy task to choose only 5%
sity to action. The notion of hacking of the principles to discuss. When in
democracy itself came into light,10 pos- doubt, I chose principles that may be
ing an existential threat to entire gov- less obvious to the reader, to pique cu-

64 COMMUNICATIO NS O F TH E AC M | J U NE 201 9 | VO L . 62 | NO. 6


contributed articles

riosity and to attract more computer simply the probability of cyberattacks tant yet subtle aspects of an engineer-
scientists and engineers to this impor- occurring multiplied by the potential ing discipline is understanding how to
tant problem area. The ordering here is damages that would result if they actu- think about it—the underlying attitude
completely different than in the book ally occurred. Estimating both of these that feeds insight. In the same way that
so as to provide a logical flow of the pre- quantities is challenging, but possible. failure motivates and informs depend-
sented subset. Rationale. Engineering disciplines ability principles, cyberattack moti-
Each primary principle includes a require metrics to: “characterize the vates and informs cybersecurity princi-
description of what the principle en- nature of what is and why it is that ples. Ideas on how to effectively defend
tails, a rationale for the creation of the way, evaluate the quality of a system, a system, both during design and oper-
principle, and a brief discussion of the predict system performance under ation, must come from an understand-
implications on the cybersecurity dis- a variety of environments and situa- ing of how cyberattacks succeed.
cipline and its practice. tions, and compare and improve sys- Rationale. How does one prevent at-
˲˲ Cybersecurity’s goal is to optimize tems continuously.”7 Without a met- tacks if one does not know the mecha-
mission effectiveness {03.01}. ric, it is not possible to decide whether nism by which attacks succeed? How
Description. Systems have a primary one system is better than another. does one detect attacks without know-
purpose or mission—to sell widgets, Many fellow cybersecurity engineers ing how attacks manifest? It is not pos-
manage money, control chemical complain that risk is difficult to mea- sible. Thus, students of cybersecurity
plants, manufacture parts, connect peo- sure and especially difficult to quan- must be students of cyberattacks and
ple, defend countries, fly airplanes, and tify, but proceeding without a metric adversarial behavior.
so on. Systems generate mission value is impossible. Thus, doing the hard Implications. Cybersecurity engi-
at a rate that is affected by the probabil- work required to measure risk, with neers and practitioners should take
ity of failure from a multitude of causes, a reasonable uncertainty interval, is courses and read books on ethical
including cyberattack. The purpose of an essential part of the cybersecurity hacking. They should study cyberat-
cybersecurity design is to reduce the discipline. Sometimes, it seems that tack and particularly the post-attack
probability of failure from cyberattack the cybersecurity community spends analysis performed by experts and
so as maximize mission effectiveness. more energy complaining how diffi- published or spoken about at confer-
Rationale. Some cybersecurity en- cult metrics are to create and measure ences such as Black Hat and DEF CON.
gineers mistakenly believe that their accurately, than getting on with creat- They should perform attacks within
goal is to maximize cybersecurity under ing and measuring them. lab environments designed specifi-
a given budget constraint. This exces- Implications. With risk as the pri- cally to allow for safe experimenta-
sively narrow view misapprehends the mary metric, risk-reduction becomes tion. Lastly, when successful attacks
nature of the engineering trade-offs the primary value and benefit from any do occur, cybersecurity analysts must
with other aspects of system design and cybersecurity measure—technological closely study them for root causes and
causes significant frustration among or otherwise. Total cost of cybersecu- the implications to improved com-
the cybersecurity designers, stakehold- rity, on the other hand, is calculated in ponent design, improved operations,
ers in the mission system, and senior terms of the direct cost of procuring, improved architecture, and improved
management (who must often adjudi- deploying, and maintaining the cyber- policy. “Understanding failure is the
cate disputes between these teams). In security mechanism as well as the in- key to success” {07.04}. For example,
reality, all teams are trying to optimize direct costs of mission impacts such the five-whys analysis technique used
mission effectiveness. This realization as performance degradation, delay to by the National Transportation Safety
places them in a collegial rather than market, capacity reductions, and us- Board (NTSB) to investigate aviation
an adversarial relationship. ability. With risk-reduction as a benefit accidents9 is useful to replicate and
Implications. Cybersecurity is always metric and an understanding of total adapt to mining all the useful hard-
in a trade-off with mission functional- costs, one can then reasonably compare earned defense information from the
ity, performance, cost, ease-of-use and alternate cybersecurity approaches in pain of a successful cyberattack.
many other important factors. These terms of risk-reduction return on in- ˲˲ Espionage, sabotage, and influence
trade-offs must be intentionally and vestment. For example, it is often the are goals underlying cyberattack {06.02}.
explicitly managed. It is only in con- case that there are no-brainer actions Description. Understanding adver-
sideration of the bigger picture of op- such as properly configuring existing saries requires understanding their
timizing mission that these trade-offs security mechanisms (for example, fire- motivations and strategic goals. Ad-
can be made in a reasoned manner. walls and intrusion detection systems) versaries have three basic categories
˲˲ Cybersecurity is about understand- that cost very little but significantly re- of goals: espionage—stealing secrets
ing and mitigating risk {02.01}. duce the probability of successful cy- to gain an unearned value or to de-
Description. Risk is the primary met- berattack. Picking such low-hanging stroy value by revealing stolen secrets;
ric of cybersecurity. Therefore, under- fruit should be the first step that any sabotage—hampering operations to
standing the nature and source of risk is organization takes to improving their slow progress, provide competitive ad-
key to applying and advancing the disci- operational cybersecurity posture. vantage, or to destroy for ideological
pline. Risk measurement is foundation- ˲˲ Theories of security come from purposes; and, influence—affecting
al to improving cybersecurity {17.04}. theories of insecurity {02.03}. decisions and outcomes to favor an ad-
Conceptually, cybersecurity risk is Description. One of the most impor- versary’s interests and goals, usually at

JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 65
contributed articles

the expense of those of the defender. portunities for a system design and im-
Rationale. Understanding the stra- plementation to be exposed and sub-
tegic goals of adversaries illuminates verted along its entire life cycle. Early
their value system. A value system sug- development work is rarely protected
gests in which attack goals a potential
adversary might invest most heavily in, It is much better very carefully. System components are
often reused from previous projects or
and perhaps give insight into how they to assume open source. Malicious changes can
easily escape notice during system inte-
adversaries know
will pursue those goals. Different ad-
versaries will place different weights on gration and testing because of the com-
different goals within each of the three
categories. Each will also be willing to
at least as much as plexity of the software and hardware in
modern systems. The maintenance and
spend different amounts to achieve the designer does update phases are also vulnerable to
their goals. Clearly, a nation-state intel-
ligence organization, a transnational
about the system. both espionage and sabotage. The ad-
versary also has an opportunity to
terrorist group, organized crime, a stealthily study a system during opera-
hacktivist and a misguided teenager tion by infiltrating and observing the
trying to learn more about cyberattacks system, learning how the system works
all have very different profiles with re- in reality, not just how it was intended
spect to these goals and their invest- by the designer (which can be signifi-
ment levels. These differences affect cantly different, especially after an ap-
their respective behaviors with respect preciable time in operation). Second,
to different cybersecurity architectures. the potential failure from making too
Implications. In addition to inform- weak of an assumption could be cata-
ing the cybersecurity designer and op- strophic to the system’s mission, where-
erator (one who monitors status and as making strong assumptions merely
controls the cybersecurity subsystem could make the system more expensive.
in real time), understanding attacker Clearly, both probability (driven by op-
goals allows cybersecurity analysts to portunity) and prudence suggest mak-
construct goal-oriented attack trees ing the more conservative assumptions.
that are extraordinarily useful in guid- Implications. The implications of
ing design and operation because they assuming the adversary knows the sys-
give insight into attack probability and tem at least as well as the designers and
attack sequencing. Attack sequencing, operators are significant. This princi-
in turn, gives insight into getting ahead ple means that cybersecurity designers
of attackers at interdiction points with- must spend a substantial amount of
in the attack step sequencing {23.18}. resources: Minimizing the probability
˲˲ Assume your adversary knows your of flaws in design and implementation
system well and is inside it {06.05}. through the design process itself, and
Description. Secrecy is fleeting and performing extensive testing, includ-
thus should never be depended upon ing penetration and red-team testing
more than is absolutely necessary focused specifically on looking at the
{03.05}. This is true of data but ap- system from an adversary perspective.
plies even more strongly with respect The principle also implies a cyberse-
to the system itself {05.11}. It is un- curity engineer must understand the
wise to make rash and unfounded as- residual risks in terms of any known
sumptions that cannot be proven with weaknesses. The design must com-
regard to what a potential adversary pensate for those weaknesses through
may or may not know. It is much safer architecture (for example, specifically
to assume they know at least as much focusing the intrusion detection sys-
as the designer does about the system. tem to monitor possible exploitation of
Beyond adversary knowledge of the sys- those weaknesses), as opposed to hop-
tem, a good designer makes the stron- ing the adversary does not find them
ger assumption that an adversary has because they are “buried too deep”
managed to co-opt at least part of the or, worse yet, because the defender
system sometime during its life cycle. believes that the attacker is “not that
It must be assumed that an adversary sophisticated.” Underestimating the
changed a component to have some de- attacker is hubris. As the saying goes:
gree of control over its function so as to pride comes before the fall {06.04}.
operate as the adversary’s inside agent. Assuming the attacker is (partially)
Rationale. First, there are many op- inside the system requires the designer

66 COMMUNICATIO NS O F TH E AC M | J U NE 201 9 | VO L . 62 | NO. 6


contributed articles

to create virtual bulkheads in the sys- date distribution and maintenance.


tem and to detect and thwart attacks ˲˲ An attacker’s priority target is the
propagating from one part of the sys- cybersecurity system {19.17}.
tem (where the attacker may have a Description. Closely following from
toehold) to the next. This is a wise ap-
proach because many sophisticated at-
The effectiveness the primacy-of-integrity principle
{03.06} is the criticality of the cyber-
tacks, such as worms, often propagate of depth could be security subsystem. To attack the mis-
within the system once they find their
way in (for example, through a phish-
measured by how sion, it is necessary first to disable
any security controls that effectively
ing attack on an unsuspecting user miserable it makes defend against the adversary’s attack
who clicked on an attacker’s malicious
link in an email message). an attacker’s life. path—including the security controls
that defend the security subsystem it-
˲˲ Without integrity, no other cyber- self. Great care must be taken to pro-
security properties matter {03.06}. tect and monitor the cybersecurity sub-
Description. Cybersecurity is some- system carefully {23.12}.
times characterized as having three Rationale. The security subsystem
pillars, using the mnemonic C-I-A: pre- protects the mission system. There-
serving confidentiality of data, ensuring fore, attempted attacks on the cyber-
the integrity of both the data and the security subsystem are harbingers of
system, and ensuring the availability attacks on the mission system itself
of the system to provide the services for {22.08}. The cybersecurity system is
which it was designed. Sometimes, cy- therefore a prime target of the adver-
bersecurity engineers become hyperfo- sary because it is the key to attacking
cused on one pillar to the exclusion of the mission system. Protection of the
adequate attention to the others. This cybersecurity system is thus para-
is particularly true of cybersecurity mount {21.03}. For example, the cyber-
engineers who have their roots in U.S. security audit log integrity is important
Department of Defense (DoD) cyberse- because attackers attempt to alter the
curity because confidentiality of clas- log to hide evidence of their cyberat-
sified data is a high-priority concern tack activities.
in the DoD. The reality is that all other Implications. The cybersecurity sys-
system properties depend on system tem must be carefully designed to it-
integrity, which therefore has primacy. self be secure. The cybersecurity of the
Rationale. System integrity is the cybersecurity system cannot depend
single most important property be- on any other less secure systems. Do-
cause, without it, no other system ing so creates an indirect avenue for
properties are possible. No matter attack. For example, if the identity
what properties a system may possess and authentication process for access
when deployed, they can be immedi- maintenance ports for updating the
ately subverted by the attacker altering cybersecurity system use simple pass-
the system to undo those properties words over remotely accessible net-
and replace them with properties de- work ports, that becomes the weakest
sirable to the attacker. This gives rise to link of the entire system. In addition,
the fundamental concept of the refer- cybersecurity engineers cannot simply
ence monitor {20.02}, which requires use the cybersecurity mechanism that
the security-critical subsystem be cor- the cybersecurity system provides to
rect (perform the required security protect the mission systems. In other
functions), non-bypassable (so that the words, the cybersecurity system cannot
attacker cannot circumvent the correct use itself to protect itself; that creates
controls to access protected resources), a circular dependency that will almost
and tamperproof (so the system cannot certainly create an exploitable flaw an
be altered without authorization). attacker can use. Lastly, the cyberse-
Implications. This primacy-of-integ- curity mechanisms are usually hosted
rity principle means that cybersecu- on operating systems and underlying
rity engineers must focus attention on hardware, which become the under-
access control to the system as a first belly of the cybersecurity system. That
priority, including heavy monitoring of underbelly must be secured using dif-
the system for any unauthorized chang- ferent cybersecurity mechanisms, and
es. This priority extends to the earlier it is best if those mechanisms can be as
stages of system life cycle such as up- simple as possible. Complexity is the

JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 67
contributed articles

enemy of cybersecurity because of the of attack, for all attack classes, will be will have for the targeted attack class.
difficulty of arguing that complex sys- equally difficult, and above the cost and Said a different way, the effectiveness of
tems are correct {19.09}. risk thresholds of the attackers. depth could be measured by how miser-
˲˲ Depth without breadth is useless; Implications. This depth-and- able it makes an attacker’s life.
breadth without depth, weak {08.02}. breadth principle implies that the cy- ˲˲ Failing to plan for failure guaran-
Description. Much ado has been bersecurity engineer must have a firm tees catastrophic failure {20.06}.
made about the notion of the concept understanding of the entire spectrum Description. System failures are in-
of defense in depth. The idea is often of cyberattacks, not just a few attacks. evitable {19.01, 19.05}. Pretending
vaguely defined as layering cyberse- More broadly, the principle suggests otherwise is almost always catastroph-
curity approaches including people, the cybersecurity community must de- ic. This principle applies to both the
diverse technology, and procedures to velop better cyberattack taxonomies mission system and cybersecurity
protect systems. Much more precision that capture the entire attack space, subsystem that protects the mission
is needed for this concept to be truly including hardware attacks, device system. Cybersecurity engineers must
useful to the cybersecurity design pro- controller attacks, operating system understand that their systems, like all
cess. Layer how? With respect to what? attacks, and cyberattacks used to af- systems, are subject to failure. It is in-
The unspoken answer is the cyberat- fect the beliefs of people. Further, the cumbent on those engineers to under-
tack space that covers the gamut of all principle also means that cybersecuri- stand how their systems can possibly
possible attack classes as shown in the ty measures must be properly charac- fail, including the failure of the un-
accompanying figure. terized in terms of their effectiveness derlying hardware and other systems
Rationale. One must achieve depth against the various portions of the on which they depend (forexample,
with respect to specified attack classes. cyberattack space. Those who create the microprocessors, the internal sys-
Mechanisms that are useful against or advocate for various measures or tem bus, the network, memory, and
some attack classes are entirely useless solutions will be responsible for creat- external storage systems). A student
against others. This focusing idea fos- ing specific claims about their cyber- of cybersecurity is a student of failure
ters an equally important companion attack-space coverage, and analysts {07.01} and thus a student of depend-
principle: defense in breadth. If a cyber- will be responsible for designing tests ability as a closely related discipline.
security designer creates excellent depth to thoroughly evaluate the validity of Security requires reliability; reliability
to the point of making a particular class those claims. Lastly, cybersecurity requires security {05.09}.
of attack prohibitive to an adversary, the architects will need to develop tech- Rationale. Too many cybersecurity en-
adversary may simply move to an alter- niques for weaving together cyberse- gineers forget that cybersecurity mecha-
native attack. Thus, one must cover the curity in ways that create true depth, nisms are not endowed with magical
breadth of the attack space, in depth. Ideal- measured by how the layers alter the powers of nonfailure. Requirements can
ly, the depth will be such that all avenues probability of success an adversary be ambiguous and poorly interpreted,
designs can be flawed, and implementa-
Defense depth and breadth in a cyberattack. tion errors are no less likely in security
code than in other code. Indeed, secu-
rity code often has to handle complex
timing issues and sometimes needs to
be involved in hardware control. This
involves significantly more complexity
than normal systems and thus requires
Depth = 2 even more attention to failure avoid-
ance, detection, and recovery {05.10}.
Yet the average cybersecurity engineer
today seems inadequately schooled in
Depth = 1
this important related discipline.
Implications. Cybersecurity engineer-
ing requires design using dependabil-
ity engineering principles. This means
that cybersecurity engineers must un-
derstand the nature and cause of faults,
Depth = 3 how the activation of faults lead to er-
rors, which can propagate and cause
Attack space
system failures.1 They must understand
Attack class within the attack space where size this not only with respect to the cyber-
corresponds to number of attacks in the class security system they design, but all the
The subset of attacks classes systems on which the system depends
covered by a security control
and which depend on it, including the
mission system itself.
˲˲ Strategy and tactics knowledge

68 COMMUNICATIO NS O F TH E AC M | J U NE 201 9 | VO L . 62 | NO. 6


contributed articles

comes from attack encounters {01.09}. based on this knowledge are some- defenders to autonomic action and
Description. As important as good times called playbooks. They must planning that may eventually be driv-
cybersecurity design is, good cyberse- be developed in advance of attacks en by artificial intelligence. Stronger
curity operations is at least as impor- {23.05} and must be broad enough and stronger cybersecurity measures
tant. Each cybersecurity mechanism is {23.07} to handle a large variety of at- that dynamically adapt to cyberat-
usually highly configurable with hun- tack situations that are likely to occur tacks will similarly lead adversaries
dreds, thousands, and even millions in real-world operations. The process to more intelligent and autonomic
of possible settings (for example, the of thinking through responses to vari- adaptations in their cyberattacks.
rule set of firewalls denying or permit- ous cyberattack scenarios, in itself, The road inevitably leads to machine-
ting each combination, port, protocol, is invaluable in the planning process controlled autonomic action-coun-
source address range, and destination {23.10}. Certain responses that may be teraction and machine-driven adap-
address range). What are the optimal contemplated during this process may tation and evolution of mechanisms.
settings of all of these various mecha- need infrastructure (such as, actuators) This may have surprising and poten-
nisms? The answer depends on varia- to execute the action accurately and tially disastrous results to the system
tions in the mission and variations in quickly enough {23.15} to be effective. called humanity {25.02, 25.04}.
the system environment, including This insight will likely lead to design
attack attempts that may be ongoing. requirements for implementing such Acknowledgments
The settings are part of a trade-off actuators as the system is improved. First and foremost, I acknowledge all
space for addressing the entire spec- of the formative conversations with
trum of attacks. The reality is there The Future my technical mentor, Brian Snow. He
is no static optimal setting for all cy- Systematically extracting, presenting, is a founding cybersecurity intellectual
berattack scenarios under all possible and building the principles underlying who has generously, gently, and wisely
conditions {22.07}. Furthermore, dy- trustworthy systems design is not the guided many minds throughout his il-
namically setting the controls leads to work of one cybersecurity engineer— lustrious career. Second, I thank the
a complex control-feedback problem not by a long shot. The task is difficult, dozens of brilliant cybersecurity engi-
{23.11}. Where does the knowledge daunting, complex, and never-ending. neers and scientists with whom I have
come from regarding how to set the I mean here to present a beginning, had the opportunity to work over the
security control parameters accord- not the last word on the matter. My last three decades. Each has shone a
ing to the particulars of the current goal is to encourage the formation of light of insight from a different direc-
situation? It is extracted from the in- a community of cybersecurity and sys- tion that helped me see the bigger pic-
formation that comes from analyzing tems engineers strongly interested in ture of underlying principles.
cyberattack encounters, both real and maturing and advancing their disci-
simulated, both those that happen to pline so that others may stand on their References
1. Avizienis, A., Laprie, J.-C., and Randell, B. Fundamental
one’s own organization and those that shoulders. This community is served concepts of dependability. In Proceedings of the 3rd
happen to one’s neighbors. by like-minded professionals shar- IEEE Information Survivability Workshop (Boston, MA,
Oct. 24–26). IEEE, 2000, 7–12.
Rationale. There is certainly good ing their thoughts, experiences, and 2. Hamilton, S.N., Miller, W.L., Ott, A., and Saydjari, O.S.
The role of game theory in information warfare.
theory, such as game-theory based results in papers, conferences, and In Proceedings of the 4th Information Survivability
approaches,2 which one can develop over a beverage during informal gath- Workshop. 2001.
3. Hammond, S.A. and Mayfield, A.B. The Thin Book of
about how to control the system ef- erings. My book and this article are a Naming Elephants: How to Surface Undiscussables
fectively (for example, using standard call to action for this community to for Greater Organizational Success. McGraw-Hill, New
York, 2004, 290–292.
control theory). On the other hand, organize and work together toward the 4. Morgan, S. Top 5 Cybersecurity Facts, Figures and
practical experience plays an impor- lofty goal of building the important Statistics for 2018. CSO Online; https://bit.ly/2KG6jJV.
5. NASA. Report of the Presidential Commission on the
tant role in learning how to effectively underpinnings from a systems-engi- Space Shuttle Challenger Accident. June 6, 1986;
defend a system. This knowledge is neering perspective. https://history.nasa.gov/rogersrep/genindex.htm
6. Rand Corporation. Foundations of Effective Influence
called strategy (establishing high-lev- Lastly, I will point out that cyber- Operations: A Framework for Enhancing Army
el goals in a variety of different situ- attack measures and cybersecurity Capabilities. Rand Corp. 2009; https://www.rand.
org/content/dam/rand/pubs/monographs/2009/
ations) and tactics (establishing ef- countermeasures are in an eternal co- RAND_MG654.pdf
fective near-term responses to attack evolution and co-escalation {14.01}. 7. Saydjari, O.S. Why Measure? Engineering Trustworthy
Systems. McGraw-Hill, New York, 2018, 290–292.
steps the adversary takes). Improvements to one discipline 8. Saydjari, O.S. Engineering Trustworthy Systems: Get
Implications. Strategy and tactics Cybersecurity Design Right the First Time. McGraw-
will inevitably create an evolution- Hill Education, 2018.
knowledge must be actively sought, ary pressure on the other. This has 9. Wiegmann, D. and Shappell, S.A. A Human Error
Approach to Aviation Accident Analysis: The Human
collected with intention (through ana- at least two important implications. Factors Analysis and Classification System. Ashgate
lyzing real encounters, performing First, the need to build cybersecu- Publishing, 2003.
10. Zarate, J.C. The Cyber Attacks on Democracy.
controlled experiments, and perform- rity knowledge to build and operate The Catalyst 8, (Fall 2017); https://bit.ly/2IXttZr
ing simulations {23.04}), curated, and trustworthy systems will need contin-
effectively employed in the operations uous and eternal vigilant attention. O. Sami Saydjari (ssaydjari@gmail.com) is Founder and
of a system. Cybersecurity systems Second, communities on both sides President of the Cyber Defense Agency, Inc., Clarksville,
MD, USA.
must be designed to store, communi- need to be careful about where the
cate, and use this knowledge effectively co-evolution leads. Faster and faster Copyright held by author/owner.
in the course of real operations. Plans cyberattacks will lead cybersecurity Publication rights licensed to ACM. $15.00.

JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 69
review articles
DOI:10.1145/ 3282486
most clearly seen in the performance
To trust the behavior of complex AI algorithms, of the latest deep neural network im-
age analysis systems. While their accu-
especially in mission-critical settings, racy at object-recognition on naturally
they must be made intelligible. occurring pictures is extraordinary,
imperceptible changes to input im-
BY DANIEL S. WELD AND GAGAN BANSAL ages can lead to erratic predictions, as
shown in Figure 1. Why are these recog-

The Challenge
nition systems so brittle, making differ-
ent predictions for apparently identical
images? Unintelligible behavior is not
limited to machine learning; many AI

of Crafting
programs, such as automated planning
algorithms, perform search-based look
ahead and inference whose complexity
exceeds human abilities to verify. While

Intelligible
some search and planning algorithms
are provably complete and optimal, in-
telligibility is still important, because
the underlying primitives (for example,

Intelligence
search operators or action descrip-
tions) are usually approximations.29
One can’t trust a proof that is based on
(possibly) incorrect premises.
Despite intelligibility’s apparent
value, it remains remarkably difficult
to specify what makes a system “intel-
ligible.” (We discuss desiderata for in-
telligible behavior later in this article.)
In brief, we seek AI systems where it
is clear what factors caused the sys-
ARTIFICIAL INTELLIGENCE (AI) systems have reached or tem’s action,24 allowing the users to
exceeded human performance for many circumscribed predict how changes to the situation
would have led to alternative behav-
tasks. As a result, they are increasingly deployed in iors, and permits effective control of
mission-critical roles, such as credit scoring, predicting
if a bail candidate will commit another crime, selecting key insights
the news we read on social networks, and self- ˽˽ There are important technical and social
reasons to prefer inherently intelligible
driving cars. Unlike other mission-critical software, AI models (such as linear models
extraordinarily complex AI systems are difficult to or GA2Ms) over deep neural models;
furthermore, intelligible models often
test: AI decisions are context specific and often based have comparable accuracy.

on thousands or millions of factors. Typically, AI ˽˽ When an AI system is based on an


inscrutable model, it may explain its
behaviors are generated by searching vast action spaces decisions by mapping those decisions
onto a simpler, explanatory model using
or learned by the opaque optimization of mammoth techniques such as local approximation
and vocabulary transformation.
neural networks operating over prodigious amounts of
˽˽ Results from psychology show that
training data. Almost by definition, no clear-cut method explanation is a process, best thought
can accomplish these AI tasks. of as a conversation between explainer
and listener. We advocate increased
Unfortunately, much AI-produced behavior is alien, work on interactive explanation systems
that can respond to a wide range of
that is, it can fail in unexpected ways. This lesson is follow-up questions.

70 COMMUNICATIO NS O F TH E ACM | J U NE 201 9 | VO L . 62 | NO. 6


the AI by enabling interaction. As we to an explanatory model can apply to intelligence, characterizes intelligibility,
will illustrate, there is a central tension whichever AI technique is currently and explains why it is important even
between a concise explanation and an delivering the best performance, but in systems with measurably high per-
accurate one. its explanation inherently differs from formance. We describe the benefits
As shown in Figure 2, our survey the way the AI system actually operates. and limitations of GA2M—a power-
focuses on two high-level approaches This yields a central conundrum: How ful class of interpretable ML models.
to building intelligible AI software: can a user trust that such an explana- Then, we characterize methods for
ensuring the underlying reasoning or tion reflects the essence of the underly- handling inscrutable models, discuss-
learned model is inherently interpre- ing decision and does not conceal im- ing different strategies for mapping
table, for example, by learning a linear portant details? We posit the answer is to a simpler, intelligible model appro-
model over a small number of well- to make the explanation system inter- priate for explanation and control. We
understood features, and if it is neces- active so users can drill down until they sketch a vision for building interactive
sary to use an inscrutable model, such are satisfied with their understanding. explanation systems, where the map-
as complex neural networks or deep- The key challenge for designing in- ping changes in response to the user’s
look ahead search, then mapping this telligible AI is communicating a com- needs. Lastly, we argue that intelligi-
complex system to a simpler, explana- plex computational process to a hu- bility is important for search-based AI
tory model for understanding and con- man. This requires interdisciplinary systems as well as for those based on
trol.28 Using an interpretable model skills, including HCI as well as AI and machine learning and that similar so-
provides the benefit of transparency machine learning expertise. Further- lutions may be applied.
IMAGE F RO M SH UTT ERSTOCK.CO M

and veracity; in theory, a user can see more, since the nature of explanation
exactly what the model is doing. Unfor- has long been studied by philosophy Why Intelligibility Matters
tunately, interpretable methods may and psychology, these fields should While it has been argued that expla-
not perform as well as more complex also be consulted. nations are much less important than
ones, such as deep neural networks. This article highlights key approaches sheer performance in AI systems, there
Conversely, the approach of mapping and challenges for building intelligible are many reasons why intelligibility is

JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 71
review articles

important. We start by discussing tech- utility function. For example, as Lipton AI may be using inadequate features.
nical reasons, but social factors are im- observed,25 “An algorithm for making Features are often correlated, and
portant as well. hiring decisions should simultane- when one feature is included in a model,
AI may have the wrong objective. ously optimize for productivity, ethics machine learning algorithms extract
In some situations, even 100% perfect and legality.” However, how does one as much signal as possible from it, in-
performance may be insufficient, for express this trade-off? Other examples directly modeling other features that
example, if the performance metric is include balancing training error while were not included. This can lead to
flawed or incomplete due to the diffi- uncovering causality in medicine and problematic models, as illustrated by
culty of specifying it explicitly. Pundits balancing accuracy and fairness in re- Figure 4b (and described later), where
have warned that an automated factory cidivism prediction.12 For the latter, a the ML determined that a patient’s
charged with maximizing paperclip simplified objective function such as prior history of asthma (a lung dis-
production, could subgoal on killing accuracy combined with historically ease) was negatively correlated with
humans, who are using resources that biased training data may cause uneven death by pneumonia, presumably due
could otherwise be used in its task. performance for different groups (for to correlation with (unmodeled) vari-
While this example may be fanciful, it example, people of color). Intelligibil- ables, such as these patients receiving
illustrates that it is remarkably diffi- ity empowers users to determine if an timely and aggressive therapy for lung
cult to balance multiple attributes of a AI is right for the right reasons. problems. An intelligible model helps
humans to spot these issues and cor-
Figure 1. Adding an imperceptibly small vector to an image changes the GoogLeNet39 image rect them, for example, by adding ad-
recognizer’s classification of the image from “panda” to “gibbon.” Source: Goodfellow et al.9
ditional features.4
Distributional drift. A deployed
model may perform poorly in the wild,
that is, when a difference exists be-
tween the distribution which was used
during training and that encountered
during deployment. Furthermore, the
deployment distribution may change
over time, perhaps due to feedback
from the act of deployment. This is
common in adversarial domains, such
“panda” “nematode” “gibbon” as spam detection, online ad pricing,
57.7% confidence 8.2% confidence 99.3% confidence
and search engine optimization. Intel-
ligibility helps users determine when
models are failing to generalize.
Figure 2. Approaches for crafting intelligible AI. Facilitating user control. Many AI
systems induce user preferences from
their actions. For example, adaptive
news feeds predict which stories are
Map to Simpler Model
No likely most interesting to a user. As
Intelligible? • Explanations
• Controls robots become more common and en-
ter the home, preference learning will
become ever more common. If users
Yes understand why the AI performed an
undesired action, they can better issue
instructions that will lead to improved
Use Interact with
Directly Simpler Model
future behavior.
User acceptance. Even if they do not
seek to change system behavior, users
have been shown to be happier with
and more likely to accept algorithmic
Figure 3. The dashed blue shape indicates the space of possible mistakes humans can make. decisions if they are accompanied by
an explanation.18 After being told they
The red shape denotes the AI’s mistakes; should have their kidney removed, it’s
its smaller size indicates a net reduction
in the number of errors. The gray region Human Errors
natural for a patient to ask the doctor
denotes AI-specific mistakes a human why—even if they don’t fully under-
would never make. Despite reducing the AI Errors AI-Specific stand the answer.
Errors
total number of errors, a deployed model Improving human insight. While
may create new areas of liability (gray),
necessitating explanations. improved AI allows automation of
tasks previously performed by hu-
mans, this is not their only use. In ad-

72 COMM UNICATIO NS O F THE ACM | J U NE 201 9 | VO L . 62 | NO. 6


review articles

Figure 4. A part of Figure 1 from Caruana et al.4 showing three (of 56 total) components for a GA2M model, which was trained to predict a
patient’s risk of dying from pneumonia.

The two line graphs depict the contribution of individual features to risk: patient’s age, and Boolean variable asthma.
The y-axis denotes its contribution (log odds) to predicted risk. The heat map visualizes the contribution due to
pairwise interactions between age and cancer rate.

0.5
1.2 1.2 100
90 0.4
1 1
80 0.3
0.8 0.8
0.6 0.6 70 0.2
0.4 0.4 60 0.1
0.2 0.2 50
40 0
0 0
30 –0.1
–0.2 –0.2
–0.4 –0.4 20 –0.2
20 30 40 50 60 70 80 90 100 –1 –0.5 0 0.5 1 –1 –0.5 0 0.5 1
(a) Age (b) Asthma (c) Age vs. Cancer

dition, scientists use machine learning absent C, E would not have occurred; cial science research suggests an ex-
to get insight from big data. Medicine furthermore, C should be minimal, planation is best considered a social
offers several examples.4 Similarly, an intuition known to early scientists, process, a conversation between ex-
the behavior of AlphaGo35 has revolu- such as William of Occam, and formal- plainer and explainee.15,30 As a result,
tionized human understanding of the ized by Halpern and Pearl.11 Grice’s rules for cooperative communi-
game. Intelligible models greatly facili- Following this logic, we suggest a cation10 may hold for intelligible expla-
tate these processes. better criterion than simulatability is nations. Grice’s maxim of quality says
Legal imperatives. The European the ability to answer counterfactuals, be truthful, only relating things that
Union’s GDPR legislation decrees citi- aka “what-if” questions. Specifically, are supported by evidence. The maxim
zens’ right to an explanation, and oth- we say that a model is intelligible to of quantity says to give as much in-
er nations may follow. Furthermore, the degree that a human user can pre- formation as is needed, and no more.
assessing legal liability is a growing dict how a change to a feature, for ex- The maxim of relation: only say things
area of concern; a deployed model (for ample, a small increase to its value, that are relevant to the discussion. The
example, self-driving cars) may intro- will change the model’s output and if maxim of manner says to avoid ambi-
duce new areas of liability by causing they can reliably modify that response guity, being as clear as possible.
accidents unexpected from a human curve. Note that if one can simulate the Miller summarizes decades of work
operator, shown as “AI-specific error” model, predicting its output, then one by psychological research, noting that
in Figure 3. Auditing such situations to can predict the effect of a change, but explanations are contrastive, that is,
assess liability requires understanding not vice versa. of the form “Why P rather than Q?”
the model’s decisions. Linear models are especially inter- The event in question, P, is termed the
pretable under this definition because fact and Q is called the foil.30 Often the
Defining Intelligibility they allow the answering of counter- foil is not explicitly stated even though
So far we have treated intelligibility factuals. For example, consider a naive it is crucially important to the expla-
informally. Indeed, few computing re- Bayes unigram model for sentiment nation process. For example, consid-
searchers have tried to formally define analysis, whose objective is to predict er the question, “Why did you predict
what makes an AI system interpre- the emotional polarity (positive or the image depicts an indigo bun-
table, transparent, or intelligible,6 but negative) of a textual passage. Even if ting?” An explanation that points to
one suggested criterion is human sim- the model were large, combining evi- the color blue implicitly assumes the
ulatability:25 Can a human user easily dence from the presence of thousands foil is another bird, such as a chicka-
predict the model’s output for a given of words, one could see the effect of dee. But perhaps the questioner won-
input? By this definition, sparse linear a given word by looking at the sign ders why the recognizer did not pre-
models are more interpretable than and magnitude of the corresponding dict a pair of denim pants; in this case
dense or non-linear ones. weight. This answers the question, a more precise explanation might
Philosophers, such as Hempel and “What if the word had been omitted?” highlight the presence of wings and a
Salmon, have long debated the nature Similarly, by comparing the weights beak. Clearly, an explanation targeted
of explanation. Lewis23 summarizes: associated with two words, one could to the wrong foil will be unsatisfying,
“To explain an event is to provide some predict the effect on the model of sub- but the nature and sophistication of a
information about its causal history.” stituting one for the other. foil can depend on the end user’s ex-
But many causal explanations may ex- Ranking intelligible models. Since pertise; hence, the ideal explanation
ist. The fact that event C causes E is one may have a choice of intelligible will differ for different people.6 For ex-
best understood relative to an imag- models, it is useful to consider what ample, to verify that an ML system is
ined counterfactual scenario, where makes one preferable to another. So- fair, an ethicist might generate more

JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 73
review articles

complex foils than a data scientist. as previous facts have been explained tive contribution decreases risk. For
Most ML explanation systems have re- amplifies this effect.36 example, Figure 4a shows how the pa-
stricted their attention to elucidating tient’s age affects predicted risk. While
the behavior of a binary classifier, that Inherently Intelligible Models the risk is low and steady for young pa-
is, where there is only one possible foil Several AI systems are inherently intel- tients (for example, age < 20), it increas-
choice. However, as we seek to explain ligible, and we previously observed that es rapidly for older patients (age > 67).
multiclass systems, addressing this is- linear models support counterfactual Interestingly, the model shows a sud-
sue becomes essential. reasoning. Unfortunately, linear models den increase at age 86; perhaps a result
Many systems are simply too com- have limited utility because they often of less aggressive care by doctors for
plex to understand without approxi- result in poor accuracy. More expres- patients “whose time has come.” Even
mation. Here, the key challenge is sive choices may include simple deci- more surprising is the sudden drop for
deciding which details to omit. After sion trees and compact decision lists. patients over 100. This might be anoth-
many years of study, psychologists de- To concretely illustrate the benefits of er social effect; once a patient reaches
termined that several criteria can be intelligibility, we focus on Generalized the magic “100,” he or she gets more
prioritized for inclusion in an explana- additive models (GAMs), which are a aggressive care. One benefit of an inter-
tion: necessary causes (vs. sufficient powerful class of ML models that relate pretable model is its ability to highlight
ones); intentional actions (vs. those a set of features to the target using a lin- these issues, spurring deeper analysis.
taken without deliberation); proximal ear combination of (potentially nonlin- Figure 4b illustrates another surpris-
causes (vs. distant ones); details that ear) single-feature models called shape ing aspect of the learned model; appar-
distinguish between fact and foil; and functions.27 For example, if y repre- ently, a history of asthma, a respiratory
abnormal features.30 sents the target and {x1, . . . .xn} repre- disease, decreases the patients risk of
According to Lombrozo, humans sents the features, then a GAM model dying from pneumonia! This finding is
prefer explanations that are simpler takes the form y = β0 + ∑jfj (xj), where the counterintuitive to any physician, who
(that is, contain fewer clauses), more fis denote shape functions and the tar- recognizes that asthma, in fact, should
general, and coherent (that is, consis- get y is computed by summing single- in theory increase such risk. When Ca-
tent with what the human’s prior be- feature terms. Popular shape functions ruana et al. checked the data, they con-
liefs).26 In particular, she observed the include non-linear functions such as cluded the lower risk was likely due to
surprising result that humans pre- splines and decision trees. With linear correlated variables—asthma patients
ferred simple (one clause) explana- shape functions GAMs reduce to a lin- typically receive timely and aggressive
tions to conjunctive ones, even when ear models. GA2M models extend GAM therapy for lung issues. Therefore, al-
the probability of the latter was high- models by including terms for pairwise though the model was highly accurate
er than the former.26 These results interactions between features: on the test set, it would likely fail, dra-
raise interesting questions about the matically underestimating the risk to a
purpose of explanations in an AI sys- patient with asthma who had not been
tem. Is an explanation’s primary pur- previously treated for the disease.
pose to convince a human to accept Facilitating human control of GA2M
the computer’s conclusions (perhaps Caruana et al. observed that for do- models. A domain expert can fix such
by presenting a simple, plausible, mains containing a moderate number erroneous patterns learned by the
but unlikely explanation) or is it to of semantic features, GA2M models model by setting the weight of the
educate the human about the most achieve performance that is competitive asthma term to zero. In fact, GA2Ms let
likely true situation? Tversky, Kahn- with inscrutable models, such as ran- users provide much more comprehen-
eman, and other psychologists have dom forests and neural networks, while sive feedback to the model by using a
documented many cognitive biases remaining intelligible.4 Lou et al. ob- GUI to redraw a line graph for model
that lead humans to incorrect con- served that among methods available for terms.4 An alternative remedy might
clusions; for example, people reason learning GA2M models, the version with be to introduce a new feature to the
incorrectly about the probability of bagged shallow regression tree shape model, representing whether the pa-
conjunctions, with a concrete and viv- functions learned via gradient boosting tient had been recently seen by a pul-
id scenario deemed more likely than achieves the highest accuracy.27 monologist. After adding this feature,
an abstract one that strictly subsumes Both GAM and GA2M are consid- which is highly correlated with asth-
it.16 Should an explanation system ex- ered interpretable because the model’s ma, and retraining, the newly learned
ploit human limitations or seek to learned behavior can be easily under- model would likely reflect that asthma
protect us from them? stood by examining or visualizing the (by itself) increases the risk of dying
Other studies raise an additional contribution of terms (individual or from pneumonia.
complication about how to communi- pairs of features) to the final prediction. There are two more takeaways from
cate a system’s uncertain predictions For example, Figure 4 depicts a GA2M this anecdote. First, the absence of an
to human users. Koehler found that model trained to predict a patient’s risk important feature in the data represen-
simply presenting an explanation for of dying due to pneumonia, showing tation can cause any AI system to learn
a proposition makes people think that the contribution (log odds) to total risk unintuitive behavior for another, corre-
it is more likely to be true.18 Further- for a subset of terms. A positive contri- lated feature. Second, if the learner is in-
more, explaining a fact in the same way bution increases risk, whereas a nega- telligible, then this unintuitive behavior

74 COM MUNICATIO NS O F TH E AC M | J U NE 201 9 | VO L . 62 | NO. 6


review articles

is immediately apparent, allowing ap- occur when dealing with classifiers


propriate skepticism (despite high test over text, audio, image, and video
accuracy) and easier debugging. data—existing intelligible models do
Recall that GA2Ms are more expres- not perform nearly as well as inscru-
sive than simple GAMs because they
include pairwise terms. Figure 4c The key challenge table methods, like deep neural net-
works. Since these models combine
depicts such a term for the features
age and cancer. This explanation in-
for designing millions of features in complex, non-
linear ways, they are beyond human
dicates that among the patients who intelligible AI is capacity to simulate.
have cancer, the younger ones are at
higher risk. This may be because the
communicating Understanding Inscrutable Models
younger patients who develop cancer a complex There are two ways that an AI model
are probably critically ill. Again, since
doctors can readily inspect these
computational may be inscrutable. It may be pro-
vided as a blackbox API, such as Mi-
terms, they know if the learner devel- process to a human. crosoft Cognitive Services, which uses
ops unexpected conclusions. machine learning to provide image-
Limitations. As described, GA2M recognition capabilities but does not
models are restricted to binary clas- allow inspection of the underlying
sification, and so explanations are model. Alternatively, the model may
clearly contrastive—there is only one be under the user’s control yet ex-
choice of foil. One could extend GA2M tremely complex, such as a deep, neu-
to handle multiple classes by training ral network, where a user has access to
n one-vs-rest classifiers or building myriad learned parameters but can-
a hierarchy of classifiers. However, not reasonably interpret them. How
while these approaches would yield can one best explain such models to
a working multi-class classifier, we the user?
don’t know if they preserve model in- The comprehensibility/fidelity trade-
telligibility, nor whether a user could off. A good explanation of an event is
effectively adjust such a model by edit- both easy to understand and faithful,
ing the shape functions. conveying the true cause of the event.
Furthermore, recall that GA2Ms de- Unfortunately, these two criteria al-
compose their prediction into effects most always conflict. Consider the
of individual terms, which can be visu- predictions of a deep neural network
alized. However, if users are confused with millions of nodes: a complete
about what terms mean, they will not and accurate trace of the network’s
understand the model or be able to prediction would be far too complex
ask meaningful “what-if” questions. to understand, but any simplification
Moreover, if there are too many fea- sacrifices faithfulness.
tures, the model’s complexity may be Finding a satisfying explanation,
overwhelming. Lipton notes that the therefore, requires balancing the com-
effort required to simulate some mod- peting goals of comprehensibility and
els (such as decision trees) may grow fidelity. Lakkaraju et al.22 suggest for-
logarithmically with the number of mulating an explicit optimization of
parameters,25 but for GA2M the num- this form and propose an approxima-
ber of visualizations to inspect could tion algorithm for generating global ex-
increase quadratically. Several meth- planations in the form of compact sets
ods might help users manage this of if-then rules. Ribeiro et al. describe
complexity; for example, the terms a similar optimization algorithm that
could be ordered by importance; balances faithfulness and coverage in
however, it’s not clear how to esti- its search for summary rules.34
mate importance. Possible methods Indeed, all methods for rendering an
include using an ablation analysis to inscrutable model intelligible require
compute influence of terms on model mapping the complex model to a sim-
performance or computing the maxi- pler one.28 Several high-level approach-
mum contribution of terms as seen in es to mapping have been proposed.
the training samples. Alternatively, a Local explanations. One way to
domain expert could group terms se- simplify the explanation of a learned
mantically to facilitate perusal. model is to make it relative to a single
However, when the number of fea- input query. Such explanations, which
tures grows into the millions—which are termed local33 or instance-based,22

JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 75
review articles

Figure 5. The intuition guiding LIME’s method for constructing an approximate local model28 is likely a poor global repre-
explanation. Source: Ribeiro et al.33 sentation of f, it is hopefully an accu-
rate local approximation of the
“The black-box model’s complex decision function, f, (unknown to LIME) is represented by the blue/ boundary in the vicinity of the in-
pink background, which cannot be approximated well by a linear model. The bold red cross is the
instance being explained. LIME samples instances, gets predictions using f, and weighs them by the
stance being explained.
proximity to the instance being explained (represented here by size). The dashed line is the learned Ribeiro et al. tested LIME on several
explanation that is locally (but not globally) faithful.” domains. For example, they explained
the predictions of a convolutional neu-
ral network image classifier by con-
verting the pixel-level features into a
smaller set of “super-pixels;” to do so,
they ran an off-the-shelf segmentation
algorithm that identified regions in
the input image and varied the color of
some these regions when generating
“similar” images. While LIME provides
no formal guarantees about its explana-
tions, studies showed that LIME’s ex-
planations helped users evaluate which
of several classifiers best generalizes.
Choice of explanatory vocabulary.
Ribeiro et al.’s use of presegmented
image regions to explain image classi-
fication decisions illustrates the larger
problem of determining an explana-
tory vocabulary. Clearly, it would not
are akin to a doctor explaining specific psychologists and summarized previ- make sense to try to identify the exact
reasons for a patient’s diagnosis rather ously, should guide algorithms that pixel that led to the decision: pixels are
than communicating all of her medi- construct these simplifications. too low level a representation and are
cal knowledge. Contrast this approach Ribeiro et al.’s LIME system33 is a not semantically meaningful to users.
with the global understanding of the good example of a system for generat- In fact, deep neural network’s power
model that one gets with a GA2M model. ing a locally approximate explanatory comes from the very fact that their hid-
Mathematically, one can see a local model of an arbitrary learned model, den layers are trained to recognize la-
explanation as currying—several vari- but it sidesteps part of the question of tent features in a manner that seems
ables in the model are fixed to specific which details to omit. Instead, LIME to perform much better than previous
values, allowing simplification. requires the developer to provide two efforts to define such features indepen-
Generating a local explanation is additional inputs: A set of semantical- dently. Deep networks are inscrutable
a common practice in AI systems. ly meaningful features X′ that can be exactly because we do not know what
For example, early rule-based expert computed from the original features, those hidden features denote.
systems included explanation sys- and an interpretable learning algo- To explain the behavior of such
tems that augmented a trace of the rithm, such as a linear classifier (or a models, however, we must find some
system’s reasoning—for a particular GA2M), which it uses to generate an ex- high-level abstraction over the input
case—with background knowledge.38 planation in terms of the X′. pixels that communicate the model’s
Recommender systems, one of the The insight behind LIME is shown essence. Ribeiro et al.’s decision to use
first deployed uses of machine learn- in Figure 5. Given an instance to ex- an off-the-shelf image-segmentation
ing, also induced demand for expla- plain, shown as the bolded red cross, system was pragmatic. The regions it
nations of their specific recommen- LIME randomly generates a set of selected are easily visualized and carry
dations; the most satisfying answers similar instances and uses the black- some semantic value. However, re-
combined justifications based on box classifier, f, to predict their val- gions are chosen without any regard to
the user’s previous choices, ratings ues (shown as the red crosses and how the classifier makes a decision. To
of similar users, and features of the blue circles). These predictions are explain a blackbox model, where there
items being recommended.32 weighted by their similarity to the in- is no possible access to the classifier’s
Locally approximate explanations. put instance (akin to locally weighted internal representation, there is likely
In many cases, however, even a local regression) and used to train a new, no better option; any explanation will
explanation can be too complex to simpler intelligible classifier, shown lack faithfulness.
understand without approximation. on the figure as the linear decision However, if a user can access the
Here, the key challenge is deciding boundary, using X′, the smaller set classifier and tailor the explanation
which details to omit when creat- of semantic features. The user re- system to it, there are ways to choose
ing the simpler explanatory model. ceives the intelligible classifier as an a more meaningful vocabulary. One
Human preferences, discovered by explanation. While this explanation interesting method jointly trains a

76 COMM UNICATIO NS O F THE AC M | J U NE 201 9 | VO L . 62 | NO. 6


review articles

classifier with a natural language, im- Facilitating user control with ex- nation. Furthermore, the concerns of
age-captioning system.13 The classifier planatory models. Generating an ex- a house seeker whose mortgage appli-
uses training data labeled with the ob- planation by mapping an inscrutable cation was denied due to a FICO score
jects appearing in the image; the cap- model into a simpler, explanatory differ from those of a developer or data
tioning system is labeled with English model is only half of the battle. In ad- scientist debugging the system. There-
sentences describing the appearance dition to answering counterfactuals fore, an ideal explainer should model
of the image. By training these systems about the original model, we would the user’s background over the course
jointly, the variables in the hidden lay- ideally be able to map any control ac- of many interactions.
ers may get aligned to semantically tions the user takes in the explanatory The HCI community has long stud-
meaningful concepts, even as they are model back as adjustments to the orig- ied mental models,31 and many intel-
being trained to provide discriminative inal, inscrutable model. For example, ligent tutoring systems (ITSs) build
power. This results in English language as we illustrated how a user could di- explicit models of students’ knowl-
descriptions of images that have both rectly edit a GA2M’s shape curve (Fig- edge and misconceptions.2 However,
high image relevance (from the cap- ure 4b) to change the model’s response the frameworks for these models are
tioning training data) and high class to asthma. Is there a way to interpret typically hand-engineered for each
relevance (from the object recognition such an action, made to an intelligible subject domain, so it may be diffi-
training data), as shown in Figure 6. explanatory model, as a modification cult to adapt ITS approaches to a sys-
While this method works well for to the original, inscrutable model? It tem that aims to explain an arbitrary
many examples, some explanations in- seems unlikely that we will discover a black-box learner.
clude details that are not actually pres- general method to do this for arbitrary Even with an accurate user model,
ent in the image; newer approaches, source models, since the abstraction it is likely that an explanation will not
such as phrase-critic methods, may cre- mapping is not invertible in general. answer all of a user’s concerns, because
ate even better descriptions.14 Another However, there are likely methods for the human may have follow-up ques-
approach might determine if there are mapping backward to specific classes tions. We conclude that an explanation
hidden layers in the learned classifier of source models or for specific types system should be interactive, support-
that learn concepts corresponding to of feature-transform mappings. This is ing such questions from and actions by
something meaningful. For example, an important area for future study. the user. This matches results from psy-
Zeiler and Fergus observed that cer- chology literature, summarized earlier,
tain layers may function as edge or pat- Toward Interactive Explanation and highlights Grice’s maxims, espe-
tern detectors.40 Whenever a user can The optimal choice of explanation de- cially those pertaining to quantity and
identify the presence of such layers, pends on the audience. Just as a hu- relation. It also builds on Lim and Dey’s
then it may be preferable to use them man teacher would explain physics work in ubiquitous computing, which
in the explanation. Bau et al. describe differently to students who know or investigated the kinds of questions us-
an automatic mechanism for match- do not yet know calculus, the technical ers wished to ask about complex, con-
ing CNN representations with seman- sophistication and background knowl- text-aware applications.24 We envision
tically meaningful concepts using a edge of the recipient affects the suit- an interactive explanation system that
large, labeled corpus of objects, parts, ability of a machine-generated expla- supports many different follow-up and
and texture; furthermore, using this
alignment, their method quantitatively Figure 6. A visual explanation taken from Hendricks et al.13
scores CNN interpretability, poten- “Visual explanations are both image relevant and class relevant. In contrast, image descriptions are
tially suggesting a way to optimize for image relevant, but not necessarily class relevant, and class definitions are class relevant but not
intelligible models. necessarily image relevant.”
However, many obstacles remain.
As one example, it is not clear there are Image
Description Laysan Albatross
satisfying ways to describe important,
discriminative features, which are of- Visual
Image Relevance

ten intangible, for example, textures. Explanation


An intelligible explanation may need to
define new terms or combine language
Class
with other modalities, like patches of Definition
an image. Another challenge is induc-
ing first-order, relational descriptions,
Class Relevance
which would enable descriptions such
as “a spider because it has eight legs” Description: This is a large bird with a white neck and a black back in the water.
and “full because all seats are occupied.”
While quantified and relational abstrac- Class Definition: The Laysan Albatross is a seabird with a hooked yellow beak,
black back, and white belly.
tions are very natural for people, prog-
ress in statistical-relational learning Visual Explanation: This is a Laysan Albatross because this bird has
a hooked yellow beak, white neck, and black back.
has been slow and there are many open
questions for neuro-symbolic learning.3

JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 77
review articles

Figure 7. An example of an interactive explanatory dialog for gaining insight into a DOG/FISH image classifier.

For illustration, the questions and answers are shown in English language text, but our use of a ‘dialog’ is
for illustration only. An interactive GUI, for example, building on the ideas of Krause et al.,20 would likely be
a better realization.

H: Why? H: (Hmm. Seems like it might be just H: What happens


1 2 C: See below: 3 recognizing anemone texture!) 4 if the background
Which training examples are most anemones
influential to the prediction? are removed? E.g.,
C: These ones:

ML Classifier
Green regions argue C: I still predict FISH,
for FISH, while RED because of these green
C: I predict FISH pushes toward DOG. superpixels:
There’s more green.

drill-down actions after presenting a build on tools for interactive machine same issues also confront systems
user with an initial explanation: learning1 and explanatory debug- based on deep-lookahead search.
˲˲ Redirecting the answer by changing ging,20,21 which have explored interac- While many planning algorithms
the foil. “Sure, but why didn’t you pre- tions for adding new training exam- have strong theoretical properties,
dict class C?” ples, correcting erroneous labels in such as soundness, they search over
˲˲ Asking for more detail (that is, a existing data, specifying new features, action models that include their own
more complex explanatory model), and modifying shape functions. As assumptions. Furthermore, goal
perhaps while restricting the explana- mentioned in the previous section, it specifications are likewise incom-
tion to a subregion of feature space. may be challenging to map user adjust- plete.29 If these unspoken assump-
“I’m only concerned about women ments that are made in reference to an tions are incorrect, then a formally
over age 50 ...” explanatory model, back into the origi- correct plan may still be disastrous.
˲˲ Asking for a decision’s rationale. nal, inscrutable model. Consider a planning algorithm
“What made you believe this?” To To make these ideas concrete, Fig- that has generated a sequence of ac-
which the system might respond by dis- ure 7 presents a possible dialog as a tions for a remote, mobile robot. If the
playing the labeled training examples user tries to understand the robust- plan is short with a moderate number
that were most influential in reaching ness of a deep neural dog/fish clas- of actions, then the problem may be
that decision, for example, ones identi- sifier built atop Inception v3.39 As the inherently intelligible, and a human
fied by influence functions19 or nearest figure shows: (1) The computer cor- could easily spot a problem. However,
neighbor methods. rectly predicts the image depicts a fish. larger search spaces could be cogni-
˲˲ Query the model’s sensitivity by (2) The user requests an explanation, tively overwhelming. In these cases,
asking what minimal perturbation to which is provided using LIME.33 (3) The local explanations offer a simplifica-
certain features would lead to a differ- user, concerned the classifier is pay- tion technique that is helpful, just
ent output. ing more attention to the background as it was when explaining machine
˲˲ Changing the vocabulary by add- than to the fish itself, asks to see the learning. The vocabulary issue is like-
ing (or removing) a feature in the ex- training data that influenced the clas- wise crucial: how does one succinctly
planatory model, either from a pre- sifier; the nearest neighbors are com- and abstractly summarize a complete
defined set, by using methods from puted using influence functions.19 search subtree? Depending on the
machine teaching, or with concept While there are anemones in those choice of explanatory foil, different
activation vectors.17 images, it also seems that the system answers are appropriate.8 Sreedharan
˲˲ Perturbing the input example to see is recognizing a clownfish. (4) To gain et al. describe an algorithm for gen-
the effect on both prediction and ex- confidence, the user edits the input erating the minimal explanation that
planation. In addition to aiding under- image to remove the background, re- patches a user’s partial understand-
standing of the model (directly testing submits it to the classifier and checks ing of a domain.37 Work on mixed-ini-
a counterfactual), this action enables the explanation. tiative planning7 has demonstrated
an affected user who wants to contest the importance of supporting inter-
the initial prediction: “But officer, one Explaining Combinatorial Search active dialog with a planning system.
of those prior DUIs was overturned ...?” Most of the preceding discussion Since many AI systems, for example,
˲˲ Adjusting the model. Based on new has focused on intelligible machine AlphaGo,35 combine deep search and
understanding, the user may wish to learning, which is just one type of machine learning, additional chal-
correct the model. Here, we expect to artificial intelligence. However, the lenges will result from the need to ex-

78 COMM UNICATIO NS O F THE AC M | J U NE 201 9 | VO L . 62 | NO. 6


review articles

plain interactions between combina- ify an underlying inscrutable model. 18. Koehler, D.J. Explanation, imagination, and confidence
in judgment. Psychological Bulletin 110, 3 (1991), 499.
torics and learned models. Results from psychology show that 19. Koh, P. and Liang, P. Understanding black-box
explanation is a social process, best predictions via influence functions. In ICML, 2017.
20. Krause, J., Dasgupta, A., Swartz, J., Aphinyanaphongs,
Final Thoughts thought of as a conversation. As a re- Y. and Bertini, E. A workflow for visual diagnostics of
In order to trust deployed AI systems, sult, we advocate increased work on binary classifiers using instance-level explanations. In
IEEE VAST, 2017.
we must not only improve their robust- interactive explanation systems that 21. Kulesza, T., Burnett, M., Wong, W. and Stumpf, S.
ness,5 but also develop ways to make support a wide range of follow-up ac- Principles of explanatory debugging to personalize
interactive machine learning. In IUI, 2015.
their reasoning intelligible. Intelligi- tions. To spur rapid progress in this 22. Lakkaraju, H., Kamar, E., Caruana, R. and Leskovec, J.
bility will help us spot AI that makes important field, we hope to see col- Interpretable & explorable approximations of black
box models. KDD-FATML, 2017.
mistakes due to distributional drift or laboration between researchers in 23. Lewis, D. Causal explanation. Philosophical Papers 2
incomplete representations of goals multiple disciplines. (1986), 214–240.
24. Lim, B.Y. and Dey, A.K. Assessing demand for
and features. Intelligibility will also Acknowledgments. We thank E. intelligibility in context-aware applications. In
facilitate control by humans in increas- Proceedings of the 11th International Conference on
Adar, S. Ameshi, R. Calo, R. Caruana, Ubiquitous Computing (2009). ACM, 195–204.
ingly common collaborative human/AI M. Chickering, O. Etzioni, J. Heer, E. 25. Lipton, Z. The Mythos of Model Interpretability.
In Proceedings of ICML Workshop on Human
teams. Furthermore, intelligibility will Horvitz, T. Hwang, R. Kambhamapti, Interpretability in ML, 2016.
help humans learn from AI. Finally, E. Kamar, S. Kaplan, B. Kim, P. Simard, 26. Lombrozo, T. Simplicity and probability in causal
explanation. Cognitive Psychology 55, 3 (2007),
there are legal reasons to want intelli- Mausam, C. Meek, M. Michelson, S. 232–257.
gible AI, including the European GDPR Minton, B. Nushi, G. Ramos, M. Ri- 27. Lou, Y., Caruana, R. and Gehrke, J. Intelligible models
for classification and regression. In KDD, 2012.
and a growing need to assign liability beiro, M. Richardson, P. Simard, J. Suh, 28. Lundberg, S. and Lee, S. A unified approach to
when AI errs. J. Teevan, T. Wu, and the anonymous interpreting model predictions. NIPS, 2017.
29. McCarthy, J. and Hayes, P. Some philosophical
Depending on the complexity of reviewers for helpful conversations and problems from the standpoint of artificial intelligence.
the models involved, two approaches comments. This work was supported in Machine Intelligence (1969), 463–502.
30. Miller, T. Explanation in artificial intelligence: Insights
to enhancing understanding may be part by the Future of Life Institute grant from the social sciences. Artificial Intelligence 267
appropriate: using an inherently in- 2015-144577 (5388) with additional (Feb. 2018), 1–38.
31. Norman, D.A. Some observations on mental models.
terpretable model, or adopting an in- support from NSF grant IIS-1420667, Mental Models, Psychology Press, 2014, 15–22.
scrutably complex model and generat- ONR grant N00014-15-1-2774, and the 32. Papadimitriou, A., Symeonidis, P. and Manolopoulos,
Y. A generalized taxonomy of explanations styles
ing post hoc explanations by mapping WRF/Cable Professorship. for traditional and social recommender systems.
Data Mining and Knowledge Discovery 24, 3 (2012),
it to a simpler, explanatory model 555–583.
through a combination of currying References
33. Ribeiro, M., Singh, S. and Guestrin, C. Why should I
trust you?: Explaining the predictions of any classifier.
and local approximation. When learn- 1. Amershi, S., Cakmak, M., Knox, W. and Kulesza, T.
In KDD, 2016.
Power to the people: The role of humans in interactive
ing a model over a medium number machine learning. AI Magazine 35, 4 (2014), 105–120.
34. Ribeiro, M., Singh, S. and Guestrin, C. Anchors: High-
precision model- agnostic explanations. In AAAI,
of human-interpretable features, one 2. Anderson, J.R., Boyle, F. and Reiser, B. Intelligent
2018.
tutoring systems. Science 228, 4698 (1985), 456–462.
may confidently balance performance 3. Besold, T. et al. Neural-Symbolic Learning and
35. Silver, D. et al. Mastering the game of Go with deep
neural networks and tree search. Nature 529, 7587
and intelligibility with approaches Reasoning: A Survey and Interpretation. CoRR
(2016), 484–489.
abs/1711.03902 (2017). arXiv:1711.03902
like GA2Ms. However, for problems 4. Caruana, R., Lou, Y., Gehrke, J., Koch, P., Sturm, M.
36. Sloman, S. Explanatory coherence and the induction of
properties. Thinking & Reasoning 3, 2 (1997), 81–110.
with thousands or millions of fea- and Elhadad, N. Intelligible models for healthcare:
37. Sreedharan, S., Srivastava, S. and Kambhampati, S.
Predicting pneumonia risk and hospital 30-day
tures, performance requirements readmission. In KDD, 2015.
Hierarchical expertise- level modeling for user specific
robot-behavior explanations. ArXiv e-prints, (Feb.
likely force the adoption of inscru- 5. Dietterich, T. Steps towards robust artificial
2018), arXiv:1802.06895
intelligence. AI Magazine 38, 3 (2017).
table methods, such as deep neural 38. Swartout, W. XPLAIN: A system for creating and
6. Doshi-Velez, F. and Kim, B. Towards a rigorous science
explaining expert consulting programs. Artificial
networks or boosted decision trees. of interpretable machine learning. ArXiv (2017),
Intelligence 21, 3 (1983), 285–325.
arXiv:1702.08608
In these situations, posthoc explana- 39. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J. and
7. Ferguson, G. and Allen, J.F. TRIPS: An integrated
Wojna, Z. Rethinking the inception architecture for
tions may be the only way to facilitate intelligent problem-solving assistant. In AAAI/
computer vision. In CVPR, 2016.
IAAI, 1998.
human understanding. 40. Zeiler, M. and Fergus, R. Visualizing and understanding
8. Fox, M., Long, D. and Magazzeni, D. Explainable
convolutional networks. In ECCV, 2014.
Planning. In IJCAI XAI Workshop, 2017; http://arxiv.
Research on explanation algo- org/abs/1709.10256
rithms is developing rapidly, with 9. Goodfellow, I.J., Shlens, J. and Szegedy, C. 2014.
Explaining and Harnessing Adversarial Examples. Daniel S. Weld (weld@cs.washington.edu) is Thomas
work on both local (instance-specific) ArXiv (2014), arXiv:1412.6572 J. Cable/WRF Professor in the Paul G. Allen School of
explanations and global approxima- 10. Grice, P. Logic and Conversation, 1975, 41–58. Computer Science & Engineering at the University of
11. Halpern, J. and Pearl, J. Causes and explanations: A Washington, Seattle, WA, USA.
tions to the learned model. A key chal- structural-model approach. Part I: Causes. The British Gagan Bansal (bansalg@cs.washington.edu) is a
lenge for all these approaches is the J. Philosophy of Science 56, 4 (2005), 843–887. graduate student in the Paul G. Allen School of Computer
12. Hardt, M., Price, E. and Srebro, N. Equality of Science & Engineering at the University of Washington,
construction of an explanation vocab- opportunity in supervised learning. In NIPS, 2016. Seattle, WA, USA.
ulary, essentially a set of features used 13. Hendricks, L., Akata, Z., Rohrbach, M., Donahue,
J., Schiele, B. and Darrell, T. Generating visual
in the approximate explanation mod- explanations. In ECCV, 2016. Copyright held by authors/owners.
Publishing rights licensed to ACM.
el. Different explanatory models may 14. Hendricks, L.A., Hu, R., Darrell, T. and Akata,
Z. Grounding visual explanations. ArXiv (2017),
be appropriate for different choices of arXiv:1711.06465
explanatory foil, an aspect deserving 15. Hilton, D. Conversational processes and causal
explanation. Psychological Bulletin 107, 1 (1990), 65.
more attention from systems build- 16. Kahneman, D. Thinking, Fast and Slow. Farrar, Straus
and Giroux, New York, 2011; http://a.co/hGYmXGJ Watch the authors discuss
ers. While many intelligible models 17. Kim, B., Wattenberg, M., Gilmer, J., Cai, C., Wexler, this work in the exclusive
can be directly edited by a user, more J., Viegas, F. and Sayres, R. 2017. Interpretability Communications video.
beyond feature attribution: Quantitative testing with https://cacm.acm.org/videos/
research is needed to determine how concept activation vectors. ArXiv e-prints (Nov. 2017); the-challenge-of-crafting-
best to map such actions back to mod- arXiv:stat.ML/1711.11279 intelligible-intelligence

JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 79
Introducing ACM Transactions on Data Science (TDS)

A new journal from ACM publishing papers


on cross disciplinary innovative research
ideas, algorithms, systems, theory and
applications for data science

Now Accepting Submissions

The scope of ACM Transactions on Data Science


(TDS) includes cross disciplinary innovative research
ideas, algorithms, systems, theory and applications for
data science. Papers that address challenges at every stage,
from acquisition on, through data cleaning, transformation,
representation, integration, indexing, modeling, analysis,
visualization, and interpretation while retaining privacy, fairness,
provenance, transparency, and provision of social benefit, within the
context of big data, fall within the scope of the journal.

By its very nature, data science overlaps with many areas of computer science.
However, the objective of the journal is to provide a forum for cross-cutting
research results that contribute to data science as defined above. Papers that address
core technologies without clear evidence that they propose multi/cross-disciplinary
technologies and approaches designed for management and processing of large volumes
of data, and for data-driven decision making will be out of scope of this journal.

For more information and to submit your work,


please visit https://tds.acm.org.
research highlights
P. 82 P. 83
Technical
Perspective Heterogeneous Von Neumann/
Back to the Edge Dataflow Microprocessors
By Rishiyur S. Nikhil By Tony Nowatzki, Vinay Gangadhar, and Karthikeyan Sankaralingam

JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 81
research highlights
DOI:10.1145/ 33 2 3 9 2 1

Technical Perspective
To view the accompanying paper,
visit doi.acm.org/10.1145/3323923 rh

Back to the Edge


By Rishiyur S. Nikhil

“ YOU M AY FI RE when you are ready, data dependency (an operator can be power) to rediscover parallelism.
Gridley,” is the famous command evaluated when its inputs are avail- The 1970s through early 1990s saw
from Commodore Dewey in the Battle able). We code it in a mainstream several attempts to avoid these “un-
of Manila Bay, 1898. He may not have programming language (C/C++, Py- necessary” sequentializations (green
realized it, but he was articulating the thon, among others), which has com- circles in Figure 2). Dataflow languag-
basic principle of dataflow computing, pletely sequential semantics (zero es (mostly purely functional) and ma-
where an instruction can be executed parallelism) to make sense of reads chine code (dataflow graphs) retained
as soon as its inputs are available. and writes to memory. As illustrated parallelism from the math. Instead of
Dataflow has long fascinated computer in Figure 1, compilers sweat mightily a program counter, each instruction
architects as perhaps a more “natural” to rediscover some of the lost paral- directly named its successor(s) receiv-
way for computation circuits to best ex- lelism in their internal CDFGs (con- ing its outputs. Dataflow CPUs directly
ploit parallelism for performance. trol and data flow graphs), and then executed this graph machine code.
A visiting alien may be forgiven for produce machine code that, again, is Nowadays this computation model
experiencing whiplash when shown completely sequential. When we ex- goes by the acronym EDGE, for explicit
how we treat parallelism in programs. ecute this on a modern von Neumann dataflow graph execution.
Mathematical algorithms have abun- CPU, wide-issue, out-of-order circuits So, why aren’t we all using EDGE
dant parallelism; the only limit is once again sweat mightily (burning machines today? A short answer is
that they never quite mastered spatial
Figure 1. Parallelism during coding, compilation, and execution. or temporal locality and were sub-
par on inherently sequential code re-
gions. In contrast, modern von Neu-
Mathematical
algorithm mann CPUs excel at this, managing
efficient flow of data between circuits
that are fast-and-expensive (registers,
wires), medium (caches), and slow-
Parallelism

and-cheap (DRAMs).
CDFG in The following paper by Tony
compiler
Nowatzki, Vinay Gangadhar, and
Karthikeyan Sankaralingam describes
Out-of-order
CPU an innovative approach to exploit both
Source Machine
code code
von Neumann models. From the CDFG, their compil-
er generates both traditional sequen-
Flow stages tial machine code and a data graph,
each being executed on appropriate
circuits (blue squares in Figure 2), with
efficient hand-off mechanisms. The
Figure 2. Alternative strategies for exploiting parallelism. authors describe extensive studies to
validate the viability of this approach
for existing codes.
Mathematical EDGE computing is undergoing a
algorithm Dataflow language
(Val, Id, Sisal, pH, …) renaissance, with many researchers
Dataflow graph pursuing related ideas. There are in-
machine code
dications that big industry players are
also contemplating this direction.a
Parallelism

Dataflow CPU
CDFG in (EDGE) a Morgan, T.P. Intel’s Exascale dataflow engine
compiler Machine drops x86 and von Neumann. The NEXT Plat-
code
form, Aug 30, 2018.
EDGE
Hybrid CPU
Source Rishiyur S. Nikhil is Chief Technical Officer at
code von Neumann
Bluespec, Inc., a semiconductor tool design company in
Framingham, MA, USA.
Flow stages
Copyright held by author.

82 COMMUNICATIO NS O F TH E ACM | J U NE 201 9 | VO L . 62 | NO. 6


DOI:10.1145 / 3 3 2 3 9 2 3

Heterogeneous Von Neumann/


Dataflow Microprocessors
By Tony Nowatzki, Vinay Gangadhar, and Karthikeyan Sankaralingam

Abstract requires significant hardware overhead (register renaming,


General-purpose processors (GPPs), which traditionally rely instruction wakeup, reorder-buffer maintenance, speculation
on a Von Neumann-based execution model, incur burden- recovery, etc.). In addition, the instruction-by-instruction
some power overheads, largely due to the need to dynami- execution incurs considerable energy overheads in pipeline
cally extract parallelism and maintain precise state. Further, processing (fetch, decode, commit, etc.). As for security, the
it is extremely difficult to improve their performance with- class of vulnerabilities known as Meltdown and Spectre all
out increasing energy usage. Decades-old explicit-dataflow make use of speculative execution of one form or another,
architectures eliminate many Von Neumann overheads, but adding another reason to find an alternative.
have not been successful as stand-alone alternatives because Interestingly, there exists a well-known class of architec-
of poor performance on certain workloads, due to insuffi- tures that mitigate much of the above called explicit-­dataflow
cient control speculation and communication overheads. (e.g., Tagged Token Dataflow,1 TRIPS,3 WaveScalar20). Figure 1
We observe a synergy between out-of-order (OOO) and shows that the defining characteristic of this execution
explicit-dataflow processors, whereby dynamically switching model is how it encodes both control and data dependences
between them according to the behavior of program phases explicitly, and the dynamic instructions are ordered by these
can greatly improve performance and energy efficiency. This dependences rather than a total order. Thus, a precise
work studies the potential of such a paradigm of hetero- program state is not maintained at every instruction. The
geneous execution models, by developing a specialization benefit is extremely cheap exploitation of instruction-level
engine for explicit-dataflow (SEED) and integrating it with parallelism in hardware, because no dynamic dependence con-
a standard out-of-order (OOO) core. When integrated with struction is required.
a dual-issue OOO, it becomes both faster (1.33×) and dra- However, explicit-dataflow architectures show no signs of
matically more energy efficient (1.70×). Integrated with an replacing conventional GPPs for at least three reasons. First,
in-order core, it becomes faster than even a dual-issue OOO, control speculation is limited by the difficultly of imple-
with twice the energy efficiency. menting efficient dataflow-based squashing. Second, the
latency cost of explicit data communication can be prohibi-
tive.2 Third, compilation challenges for general workloads
1. INTRODUCTION have proven hard to surmount.5 Although a dataflow-based
As transistor scaling trends continue to worsen, power execution model may help many workloads, it can also sig-
limitations make improving the performance and energy nificantly hamper others.
efficiency of general purpose processors (GPPs) ever more Unexplored opportunity: What is unexplored so far is the
intractable. The status quo approach of scaling proces- fine-grained interleaving of explicit-dataflow with Von
sor structures consumes too much power to be worth the Neumann execution—that is, the theoretical and practi-
marginal improvements in performance. On top of these cal limits of being able to switch with low cost between an
challenges, a series of recent microarchitecture level vul- explicit-­
dataflow hardware/ISA and a Von Neumann ISA.
nerabilities (Meltdown and Spectre9) exploit the underlying Figure 2(a) shows a logical view of such a heterogeneous
techniques which modern processors already rely on for architecture, and Figure 2(b) shows the capability of this
exploiting instruction-level parallelism (ILP). architecture to exploit fine-grain (thousands to millions of
Fundamental to these issues is the Von Neumann execution instructions) application phases. This is interesting now, as
model adopted by modern GPPs. To make the contract between
the program and the hardware simple, a Von Neumann Figure 1. Von Neumann vs. dataflow at a glance.
machine logically executes instructions in the order specified Precise Instruction-Order Maintained Instructions Ordered by Dependences
by the program, and dependences are implicit through the
names of storage locations (registers and memory addresses).
However, this has the consequence that exploiting ILP Instructions can be locally re-ordered after
dynamically discovering dependences.
effectively requires sophisticated techniques. Specifically,
it requires (1) dynamic discovery of register/memory depen- Von Neumann Execution Model Dataflow Execution Model
dences, (2) speculative execution past unresolved control
flow instructions, and (3) maintenance of the precise pro-
gram state at each dynamic instruction should it be need to
The original version of this paper is entitled “Exploring
be recovered (e.g., an exception due to a context switch).
the Potential of Heterogeneous Von Neumann/Dataflow
The above techniques are the heart of modern Von
Execution Models” and was ­published in ISCA 2015.
Neumann out-of-order (OOO) processors, and each technique

JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 83
research highlights

Figure 2. Taking advantage of dynamic program behavior. Figure 3. Von Neumann and dataflow execution models.

(a) Logical Arch. (b) Architecture Preference Over Time (a) Control Flow Graph (b) Original program order
Cache Hierarchy Thousands to Millions of instructions
b a b if c d e h i j
a
App. 1 Basic
OOO if (c) Ideal Schedule
Explicit- App. 2 Blocks
Core
Dataflow
Vector App. 3 c d f
d c e
live-vals Time e g
a h j
h i
trends mean that on-chip power is more limited than area; b i
Control dependence
j removed through
this creates “dark-silicon,” portions of the chip that can- if speculation..
not be kept active due to power constraints. The two major
implications are that energy efficiency is the key to improving (d) Abstract 2-Issue OOO Sched. (e) Abstract Dataflow Sched.
scalable performance, and that it becomes rationale to add Von Neumann enables efficient control spec. Dataflow enables efficient instruction parallelism.
specialized hardware which is only in-use when profitable.
a c d e j a h c
With such a hardware organization, many open ques-
tions arise: Are the benefits of fine-grained interleaving of b if d e
b if h i
execution models significant enough? How might one build
a practical and small footprint dataflow engine capable of i j
serving as an offload engine? Which types of GPP cores can
get substantial benefits? Why are certain program region- OOO gains advantage if instructions Dataflow gains advantage if more
types suitable for explicit-dataflow execution? are added to control-critical path. independent instructions are added.
To answer these questions we make the following contribu-
tions. Most importantly, we identify (and quantify) the poten-
tial of switching between OOO and explicit-dataflow at a fine The performance implications can be seen in an example
grain. Next, we develop a specialization engine for explicit-data- in Figure 3(a), which has a single control decision labeled
flow (SEED) by combining known dataflow-architecture tech- as if . In (b), we show the program instruction order for one
niques, and specializing the design for program characteristics iteration of this code, assuming the left branch was taken.
where explicit-dataflow excels as well as simplifying and com- Figure 3(c) shows the ideal schedule of these instructions on
mon program structures (loops/nested loops). We evaluate the an ideal machine (one instruction per cycle). The key to the
benefits through a design-space exploration, integrating SEED ideal execution is both the reordering of dependent instruc-
into little (in-order), medium (OOO2), and big (OOO4) cores. tions ( c , d ) before the control decision is resolved, as well
Our results demonstrate large energy benefits over >1.5×, and as being able to execute many instructions in parallel.
speedups of 1.67×, 1.33×, and 1.14× across little, medium, and A Von Neumann OOO machine has the advantage of spec-
big cores. Finally, our analysis illuminates the relationship ulative execution, but the disadvantage is the complexity of
between workload properties and dataflow profitability: code implementing hardware for issuing multiple instructions per
with high memory parallelism, instruction parallelism, and cycle (issue width) when the dependences are determined
control noncriticality is highly profitable for dataflow execu- dynamically. Therefore, (d) shows how a dual-issue OOO takes
tion. These are common properties for many emerging work- five cycles because there was not enough issue bandwidth for
loads in machine learning and data processing. both d and h before the third cycle.
A dataflow processor can easily be designed for high issue
2. UNDERSTANDING VON NEUMANN/DATAFLOW width due to dependences being explicitly encoded into the
­SYNERGY program representation. However, we assume here that the
Understanding the trade-offs between a Von Neumann dataflow processor does not perform speculation, because
machine, which reorders instructions implicitly, and a data- of the difficulty of recovering when a precise order is not
flow machine, which executes instructions in dependence maintained. Therefore, in Figure 3(e), the dataflow proces-
order, can be subtle. Yet, the trade-offs have profound impli- sor’s schedule, c and d ; must execute after the if .
cations. We attempt to distill the intuition and quantitative Although the example suggests the benefits of control
potential of a heterogeneous core as follows. specialization and wide issue widths are similar, in prac-
tice, the differences can be stark, which we can demonstrate
2.1. Intuition for execution model affinity with slight modifications to the example. If we add several
The intuitive trade-off between the two execution models is instructions to the critical path of the control decision
that explicit-dataflow is more easily specializable for high (between b and if ), the OOO core can hide these through
issue width and instruction window size (due to lack of need control speculation. If instead we add more parallel instruc-
to discover dependences dynamically), whereas an implicit- tions, the explicit-dataflow processor can execute these in
dataflow architecture is more easily specializable for specu- parallel, whereas these may be serialized in the OOO Von
lation (due to its maintenance of precise state of all dynamic Neumann machine. Explicit-dataflow can also be beneficial
instructions in total program order). if the if is unpredictable, and the OOO is anyway serialized.

84 COMMUNICATIO NS O F TH E AC M | J U NE 201 9 | VO L . 62 | NO. 6


2.2. Quantitative potential are performed (no loop reordering/tiling/layout-transforms/
A natural next question is how much potential benefit etc.). It is also nonspeculative and incurs latency when trans-
could a heterogeneous Von Neumann/dataflow core pro- ferring values between control regions. For energy, only
vide. The potential benefits of an ideal hybrid architecture functional units and caches are considered.
(ideal dataflow + four-wide OOO) relative to a standard First, we propose that requirement 1, low area and power,
OOO core are as shown in Figure 4(a), which where each can be addressed by focusing on a common, yet simplify-
speedup bar is labeled with the percentage of execution ing case: fully-inlined loops and nested loops with a limited
time that dataflow execution is profitable. Figure 4(b) total static instruction count. This helps limit the size of
shows the overall energy and performance trends for three the dataflow tags and eliminates the need for an instruction
different GPPs. cache; both of which reduce hardware complexity. In addi-
These results indicate that dataflow specialization has sig- tion, ignoring recursive regions and only allowing in-flight
nificant potential, up to 1.5× performance for an OOO4 GPP instructions from a single context eliminates the need for tag
(2× for OOO2), as well as over 2× average energy-efficiency matching hardware. Targeting nested-loops also satisfies
improvement. Furthermore, the preference for explicit- requirement 2: these regions can cover a majority of real
dataflow is frequent, covering around 65% of execution time, applications’ dynamic instructions.
but also intermittent and application-phase dependent. For low-overhead dataflow execution, requirement 3,
The percentage of execution time in dataflow mode varies communication must be lowered while maintaining paral-
greatly, often between 20% and 80%, suggesting that phase lelism. For this, we first use a distributed-issue architecture,
types can exist at a fine grain inside an application. which enables high-instruction throughput with low-ported
Overall, this suggests that a heterogeneous Von Neumann/ RAM structures. Second, we use a multibus network for sustain-
explicit-dataflow architecture with fine-granularity switch- ing instruction communication throughput at low latency.
ing can provide significant performance improvements Third, we use compound instructions to reduce communica-
along with power reduction, and thus lower energy. tion overhead. The proposed design is SEED: specialization
Remaining challenge: Although many high-perfor- engine for explicit-dataflow, shown at a high level in Figure 6,
mance explicit-dataflow architectures have been proposed and explained next.
over the last several decades, the remaining challenge is
how to achieve the same benefits while avoiding a more
heavyweight general-purpose explicit-dataflow engine Figure 5. Relationship to dataflow architectures.
(for example, WaveScalar20 or TRIPS,3 see Figure 5). The
approach we will take is to combine known dataflow Are prior dataflow architectures sufficient?
mechanisms, while simplifying and specializing for the Two reasons motivate innovation beyond existing
common workload characteristics where dataflow excels. dataflow architectures for the heterogeneous core.
First, most prior dataflow architectures have signifi-
3. SEED: AN ARCHITECTURE FOR FINE-GRAIN cant area and power overheads, because they are tar-
­DATAFLOW SPECIALIZATION geted at whole-program execution and must handle
Based on our previous analysis and insights, there are three arbitrary code. For example, TRIPS3 uses a dynam-
primary requirements for a dataflow specialization engine: ically routed mesh network to exploit many different
(1) low area and power, so integration with the GPP is fea- forms of parallelism. WaveScalar20 uses large hi-
sible; (2) enough generality to target a wide variety of work- erarchical interconnects and complex tag-matching, in
part because it needs to disambiguate instructions from
loads; and (3) achieving the benefits of dataflow execution
multiple function contexts. Second, their designs do
with few overheads.
not consider the costs of switching at low-overhead (for
The dataflow processor is only constrained by the pro-
example, not relying on prediction state that requires
gram’s control and data-dependencies, but retains the same warm-up). On the other hand, existing in-core accelera-
memory system. Note that no nonlocal program modifications tors that act as offload engines have much lower power
and area, but are not general enough. None of them can
offload entire loop regions in general—only the
Figure 4. Ideal dataflow specialization potential.
com- putation in CCA4 and DySER6 or hot loop-traces
(a) Hybrid Ideal-Dataflow Perf. Hybrid Ideal-DF GPP-Only (b) Overall Trade-offs in BERET.7
3.0 1.6
How are prior dataflow techniques used?
Perform ance Im provem ent

99%

% Explicit-
75%

86%

...
1.56x

Dataflow 1.5x perf.


1.4
SEED is highly inspired by previous decades of
Relat ive Energy

2.5 2x energy
88%
77%

benefit
1.2
dataflow research. For example, Monsoon19 im-
56%

84%
97%

79%

2.0
92%
64%
67%
98%

1.0
64%

proves the efficiency of matching operands using an


93%

33%
73%
46%

99%

57%
33%

1.5
27%

0.8
Explicit Token Store, a concept we borrow for SEED’s
4%

>2x perf.
0%

2%

1.0 0.6 and energy


benefit explicit operand buffer. The mechanisms that SEED
0.5 0.4
1.0 1.5 2.0 2.5 3.0
uses for efficient and general dataflow-based control
0.0 Relative Performance are derived from WaveScalar,20 the concept of a
h264dec

403.gcc
cjpeg-2

h263enc
djpeg-2

429.mcf

464.h264ref
cjpeg-1
djpeg-1

456.hmmer
jpg2000enc

mpeg2enc

181.mcf
jpg2000dec

mpeg2dec

458.sjeng
473.astar
164.gzip

GMEAN
175.vpr

401.bzip2
256.bzip2
gsmencode
gsmdecode

197.parser

GPP Type:
copro- cessor,12 and the concept of efficient compound
IO2 OOO2 OOO4 FUs from BERET.7

JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 85
research highlights

3.1. Von Neumann core integration through memory) and preserves cache coherence. SEED
Adaptive execution: To adaptively apply explicit-dataflow spe- adds architectural state, which must be maintained at con-
cialization, we use a technique similar to bigLITTLE, except text switches. Lastly, functional units (FUs) could be shared
that we restrict the entry points of specializable regions to with the GPP to save area (by adding bypass paths); this work
fully-inlined loops or nested loops. This simplifies inte- considers stand-alone FUs.
gration with a different ISA. Targeting longer nested-loop
regions reduces the cost of configuration and GPP core 3.2. Dataflow execution model
synchronization. SEED’s execution model closely resembles prior dataflow
GPP integration: SEED uses the same cache hierarchy as architectures, but is restricted for loops/nested-loops, and
the GPP, which facilitates fast switching (no data-copying adds the use of compound instructions.
We use a running example to aid explanation: a simple
linked-list traversal where a conditional computation is
Figure 6. High-level SEED integration and organization (IMU: instruction
performed at each node. Figure 7(a) shows the original pro-
management unit; CFU: compound functional unit; ODU: output
distribution unit). gram, (b) the Von Neumann control flow graph (CFG) repre-
... sentation, and (c) SEED’s explicit-dataflow representation.
Data-dependence: Similar to other dataflow representa-
L1 Cache tions, SEED programs follow the dataflow firing rule: instruc-
SEED Unit 1 SEED Unit 8 tions execute when their operands are ready. To initiate
DCache
ICache

computation, live-in values are sent from the host. During


dataflow execution, each instruction forwards its outputs to
Config Bus
& Init IMU IMU Arbiter
dependent instructions, either in the same iteration (solid
line in Figure 7(c) ), or in a subsequent iteration (dotted
OOO
line). For example, the a_next value loaded from memory
GPP CFU8
CFU1 is passed on to the next iteration for address computation.
... Control-flow strategy: Control dependencies between
CPU
Transfer
ODU ODU Store
Buffer instructions are converted into data dependencies. SEED
uses a switch instruction, which forwards values to one of two
possible destinations depending on the input control signal.
Specialization Engine for Explicit-Dataflow In the example, depending on the n_val comparison, v2
(SEED) is forwarded to either the if or else branch. This strategy
enables control-equivalent regions to execute in parallel.

Figure 7. (a) Example C loop; (b) control flow graph (CFG); (c) SEED program representation.

Legend
Live
a
Switch: Forwards a control or a data BLACK →Data line In
S CPU Land
value to one of the two destinations PURPLE →Control line
BLUE →Data from the switch in TRUE path a
Decision Unit: Generates a decision value Subgraph 1
!=0
based on the input RED →Data from the switch in FALSE path
GREY →Token values passed for next iteration
Live + CFU 1
Out 8
L S Memory Units: Interfaces with memory a_next
D T sub-system to load or store values Subgraph 1 mapped to → CFU 1 addr
Subgraph 2 mapped to → CFU 1 L
+ ALUs: Functional Units which perform D
a primitive computation –Add, Multiply, Subgraph 3 mapped to → CFU 2 a_next
8 a/
Shift etc., Subgraph 4 mapped to → CFU 2 To Next iteration
a_next S
!=0
S (loaded a_next)

struct A { struct A a = …
a_next
int v1,v2; v1
+
A* next; Subgraph 2
v2 4
}; a→v2
LD a_next = a→next CFU 1
… next addr L
if (a_next != 0)
Memory D Memory
A* a = … Token n_val n_val Token
Next loop iteration

while (a→next != 0) { < 0


a = a_next
a = a→next; S
n_val = LD a→v2
a_next = 0

int n_val = a→v2; if (n_val < 0)


S S

if (n_val < 0) {
a→v2 = -n_val; n_val = -n_val n_val = n_val + 1
} ST n_val, a→v2 ST n_val, a→v2 +
0
- 1
else { n_val
-nval n_val
S
S

a→v2 = n_val+1; +1
S S
} Subgraph 3 T T Subgraph 4
a→v2 ST addr
} … = a CFU 2
CFU 2
To Store To Store
a) b) Buffer c) Buffer

86 COMMUNICATIO NS O F TH E AC M | J U NE 201 9 | VO L . 62 | NO. 6


Enforcing memory-ordering: SEED uses a primarily better. One approach is to use simple heuristics, for exam-
software approach to enforce memory-ordering. When the ple, avoid control-critical regions. A dynamic approach can
compiler identifies dependent (or aliasing) instructions, be more flexible; for example, training online predictors to
the program serializes these through explicit tokens. In this give a runtime performance estimate based on per-region
example, the stores of n_val can conflict with the load from statistics. Related work explores this in detail,16, 18 and this
the next iteration (e.g., when the linked list contains a loop), work simply uses a static oracle scheduler (see Section 5).
and therefore, memory dependence edges are added. Instruction scheduling: The instruction scheduler forms
Executing compound instructions: To mitigate commu- compound instructions and assigns them to units. Its job is
nication overheads, the compiler groups primitive instruc- to balance communication cost by creating large compound
tions (e.g., adds, shifts, switches, etc.) into subgraphs and instructions, while also ensuring that combining instruc-
executes them on compound functional units (CFUs). These tions does not artificially increase the critical path length. To
are logically executed atomically. The example program con- solve this, we use integer linear programming, specifically
tains four subgraphs, mapped to two CFUs. extending a general scheduling framework for spatial archi-
tectures17 with the ability to model instruction bundling.
3.3. SEED microarchitecture
SEED achieves high instruction parallelism and simplicity 5. EVALUATION METHODOLOGY
by using eight distributed computation units. Each of these For evaluating SEED, OOO core specialization techniques,
SEED units is organized around one CFU, and units commu- and the other designs we compare to, we employ a TDG-
nicate together over a network, as shown in Figure 6. based modeling methodology.15 We use Mc-PAT11 with 22nm
Compound functional unit (CFU): CFUs are composed technology to estimate power and area. Von Neumann core
of a fixed network of primitive FUs (adders, multipliers, logi- configurations are given in Table 1.
cal units, switch units, etc.), where unused portions of the The benchmarks we chose were from SPECint and Media-
CFU are bypassed when not in use. Long latency instructions bench,10 representing a variety of control and memory irreg-
(e.g., loads) can be buffered and passed by subsequent ularity, as well as some regular benchmarks. To eliminate
instructions. Our design uses the CFU mix from existing compiler/runtime heuristics on when to use which architec-
work,7 where CFUs contain 2–5 operations. CFUs which have ture, we use an oracle scheduler, which uses previous runs
memory units will issue load and store requests to the host’s to decide when to use the OOO core, SEED, or SIMD.
memory management unit. Load requests access a store
buffer for store-to-load forwarding. 6. EVALUATING DATAFLOW SPECIALIZATION
Instruction management unit (IMU): The IMU has three ­POTENTIAL
responsibilities. First, it stores up to 32 compound instruc- To understand the potentials and trade-offs of dataflow spe-
tions, each with a maximum of four operands each for up cialization, we explore the prevalence of required program
to four dynamic loop iterations (equivalent to a 1024-entry structure, per-region performance, and overall heteroge-
instruction window). Second, it selects instructions with neous core benefits.
ready operands for execution on the CFU, giving priority to
the oldest instruction. Third, the IMU routes incoming val- 6.1. Program structure
ues from the network to appropriate storage locations based Nested loop prevalence: Figure 8 shows cumulative distribu-
on the incoming instruction tag. tions of dynamic instruction coverage with varying dynamic
Communication: The ODU is responsible for distribut- region granularity, assuming maximum 1024 instructions.
ing the output values and destination packets (SEED unit + Considering regions with a duration of 8K dynamic instruc-
instruction location + iteration offset), to the bus network, tions or longer (x-axis), nested loops can cover 60% of total
and buffering them during bus conflicts. A bus intercon- instructions, whereas inner loops cover only 20%. Nested
nect forwards output packets from the ODU to SEED unit loops also greatly increase the region duration for a given
IMU’s which use the corresponding operands. Therefore, percentage coverage (1K–64K for 40% coverage).
dependent instructions communicating over the bus can- Compound instruction prevalence: Figure 9 is a histogram
not execute in back-to-back cycles, a limitation of distrib- of per-benchmark compound instruction sizes which the
uted dataflow. compiler created, showing on average 2–3 instructions. This

4. SEED COMPILER DESIGN Table 1. Von Neuman core configurations.


The two main responsibilities of the compiler are determin-
GPP Characteristics
ing which regions to specialize and scheduling instructions
into CFUs inside SEED regions. Little (IO2) Dual issue, 1 load/store port.
Medium (OOO2) 64 entry ROB, 32 entry IW, LSQ: 16 ld/20 st,
Region selection: The compiler must find or create fully-
1 ld/st ports, speculative scheduling.
inlined nested-loop regions, which are small enough to Big (OOO4) 168 entry ROB, 48 entry IW, LSQ: 64 ld/36 st,
match SEED’s operand/instruction storage. Also, the inner 2 ld/st ports, speculative scheduling.
loop should be unrolled for instruction parallelism. An Common x86 ISA, 256-bit SIMD, 2-way 32KiB I$, 64KiB
Amdahl-tree based approach can be used to select regions.16 L1D$ (4 cycle latency), 8-way 2MB L2$
(22 cycle hit latency), 2GHz.
Also, we should avoid regions where the OOO core (through
control speculation) or the SIMD units would have performed

JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 87
research highlights

Figure 8. Cumulative % contribution for decreasing dynamic region Figure 10. Per-region SEED speedups. Highest-contributing region
lengths. Static region size £ 1024 insts. shown in red.
5
80% 4.5
5.2
% Dynamic Insts Covered

4
70% 3.5
3
60%
2.5
50% 2
1.5
40% 1
Nested-Loops 0.5
30% Inner-Loops
0

gsmencode

djpeg-2

mpeg2dec

197.parser
djpeg-1
gsmdecode

h263enc
h264dec

mpeg2enc

181.mcf
175.vpr

429.mcf
403.gcc
458.sjeng

401.bzip2
cjpeg-1

cjpeg-2

164.gzip
jpg2000dec
jpg2000enc

456.hmmer
473.astar

464.h264ref
256.bzip2
20%
10%
0%
512K
32M

4M

64K

8K

1K

128

16

2
of the OOO processor. Across these regions, indirect mem-
Minimum Allowed Region Duration (in dynamic insts) ory access is common, which precludes SIMD vectorization.
Energy benefit-only regions: These regions have similar
performance to the OOO4, but are more energy efficient by
Figure 9. Compound instruction size histogram. 2×–3×. Here, ILP tends to be lower, but control is mostly off the
critical path, allowing dataflow to compete (e.g., djpeg-1
% of Dynamic Compound Instructions

100% and h264dec). Although gsmencode and 164.gzip actu-


ally have high potential ILP, they are burdened by commu-
80%
nication between SEED units. Contrastingly, 473.astar
5
60%
4
and jpg2000enc have significant control, but still perform
40% 3 close to the OOO core. These benchmarks make up for the
2 lack of speculation by avoiding branch mispredictions and
20% 1 relying on the dataflow-based control.
0% Performance loss regions: The most common reason for
performance loss is communication latency on the critical
cjpeg1
djpeg1
gsmdecode
gsmencode
cjpeg
djpeg
h263enc
h264dec
jpg2000dec
jpg2000enc
mpeg2dec
mpeg2enc
164.gzip
181.mcf
175.vpr
197.parser
256.bzip2
429.mcf
458.sjeng
401.bzip2
473.astar
403.gcc
456.hmmer
464.h264ref

path (e.g., 403.gcc, mpeg2dec, and mpeg2enc), as well


as predictable data-dependent control (e.g., 401.bzip2).
These are fundamental dataflow limitations. In two cases,
configuration overhead was burdensome (464.h264ref
is relatively high considering that compound instructions and 197.parser). Finally, some of these regions are vec-
cannot cross control regions. Some singletons are necessary; torized on the GPP, and SEED is not optimized to exploit
however, either because control regions lack dependent data-parallelism. In practice, the above regions would not
computation, or because combining certain instructions be executed on SEED.
would create additional critical-path dependencies. In summary, speedups come from exploiting higher
memory parallelism and instruction parallelism, and avoid-
6.2. Per-region performance analysis ing mispeculation on unpredictable branches. Slowdowns
First, we compare the speedups of SEED to our most aggres- come from the extra latency cost on more serialized
sive design (OOO4) on a per-region basis. Figure 10 shows computations.
SEED’s speedup for the recurring nested-loop program
regions (each >1% total insts), where the region with the 6.3. Overall performance/energy trade-offs
highest contribution to the execution time of the original Finally, we consider the overall performance when inte-
program is shown in red. Overall, speedup varies dramati- grated with a little, medium, and big core, and compare
cally due to the significant differences in program character- explicit-dataflow specialization with existing techniques for
istics. Around 3×–5× speedup is possible, and many regions targeting irregular codes. In-place loop execution, similar
show significant speedup. We next examine the reasons to Revolver,8 locks looping instructions into the instruction
for performance differences of the highest-contribution window to prevent redundant pipeline overheads such as
regions in several categories, as follows. fetch/decode/execute, but does not otherwise change the
Performance and energy benefit regions: Compared OOO execution model. Conservation cores21 use software-
to the OOO4-wide core, SEED can provide high speed- defined region-specific accelerators, but they do not exploit
ups by exploiting ILP in compute-intensive regions (e.g., dataflow-based control (only in-order control and memory).
jpg2000dec, cjpeg, and djpeg) and from using an effec- Figure 11 shows the relative performance and energy ben-
tively larger instruction window for achieving higher mem- efits, normalized to the in-order core alone.
ory parallelism (e.g., 181.mcf and 429.mcf). The latter SEED improves performance and energy efficiency across
have high cache miss rates and clog the instruction window GPP cores types, significantly more than existing accelerator

88 COM MUNICATIO NS O F TH E ACM | J U NE 201 9 | VO L . 62 | NO. 6


Figure 11. Overall performance and energy benefit.
on a narrow range of workloads—for example, SIMD can
speedup highly regular program phases 1 only.
1.6 Design: Figure 12(b) shows how dataflow specialization further
1.14×
1.4 perf. Host GPP Core cuts into the space of programs that traditional architec-
1.54×
en. eff.
+In-place Loop
- tures are best at. Specifically, when the OOO processor’s
Relative Energy

1.2 +Cons-Cores issue width and instruction window size limits the achiev-
+SEED able ILP (region 3 ), explicit-dataflow processors can exploit
1.0
this through distributed dataflow, as well as more efficient
0.8 0.85× perf.
Core Type: execution under control unpredictability (region 4 ). Beyond
2.3× en. eff. Little (IO2) these region types, dataflow specialization can be applied to
0.6 Medium (OOO2) create engines that target other behaviors, such as repeat-
Big (OOO4) able control 5 , or to further improve highly regular regions
0.4
1.0 1.4 1.8 2.2 by combining dataflow with vector-communication 1 .
Relative Performance Future directions: The disruptive potential of exploiting
common program phase behavior using a heterogeneous
dataflow execution model can have significant implications
and microarchitectural approaches. For the little, medium, and leading to several important directions:
big cores, SEED provides 1.65×, 1.33×, and 1.14× speedup,
and 1.64×, 1.7×, and 1.53× energy efficiency, respectively. • Reduced importance of aggressive out-of-order: Dataflow
The energy benefits come primarily from the prevalence of engines which can exploit high ILP phases can reduce
regions where dataflow execution can match the host core’s the need for aggressive and power-inefficient out-of-order
performance; this occurs 71%, 64%, and 42% of the time, for cores. As a corollary, the design of modest-complexity
the little, medium, and big Von Neumann cores, respectively. loosely coupled cores should in principle be less design
Understanding disruptive trade-offs: Perhaps more inter- effort than a complex OOO core. This could lower the
esting is the disruptive changes that explicit-dataflow spe- cost-of-entry into the general-purpose core market,
cialization introduces for computer architects. First, the increasing competition and spurring innovation.
OOO2+SEED is actually reasonably close in performance • Radical departure from status quo: The simple and mod-
to an OOO4 processor on average, within 15%, while reduc- ular integration of engines targeting different behav-
ing energy 2.3×. Additionally, our estimates suggest that an iors, combined with microarchitecture-level dynamic
OOO2+SEED occupies less area than an OOO4 GPP core. compilation for dataflow ISAs22 can enable such designs
Therefore, a hybrid dataflow system introduces an interest- to be practical. This opens the potential of exploring
ing path toward a high-performance, low-energy micropro- designs with radically different microarchitectures and
cessor: start with an easier-to-engineer modest OOO core, software interfaces, ultimately opening a larger and
and add a simple, nongeneral-purpose dataflow engine. more exciting design space.
An equally interesting trade-off is to add a hybrid data- • An alternative secure processor: An open question is how
flow unit to a larger OOO core—SEED+OOO4 has much to build future secure processors that are immune to
higher energy efficiency (1.54×) with additional perfor- attacks such as Meltdown and Spectre.9 One approach
mance improvements of 1.14×. This is a significant leap for is to simply avoid speculation; this work shows that an
energy-efficiency, especially considering the difficulty of in-order core plus SEED may only lose on average around
improving the efficiency for complex, irregular workloads 20% performance with respect to an OOO core alone, at
such as SpecINT. much lower energy.
Overall, all cores can achieve significant energy benefits,
little and medium cores can achieve significant speedup, 7.2. Accelerators
and big cores receive modest performance improvement. In contrast to general-purpose processors, accelerators are
purpose-built chips integrated at a coarse grain with com-
7. DISCUSSION puting systems, for workloads important-enough to the
Dataflow specialization is a broadly applicable principle for market to justify their design and manufacturing cost. A per-
both general-purpose processors and accelerators. We out- sistent challenge facing accelerator design is that in order to
line our view on the potentially disruptive implications in achieve desired performance and energy efficiency, acceler-
these areas as well as potential future directions. ators often sacrifice generality and programmability, using
application or domain-specific software interfaces. Their
7.1. General purpose cores architecture and microarchitecture is narrowly tailored to
In this work, we showed how a dataflow processor can the particular domain and problem being solved.
more efficiently take over and execute certain phases of The principle of heterogeneous Von Neumann/dataflow
application workloads, based on their properties. This architectures can help to create a highly efficient accelera-
can be viewed visually, as shown in Figure 12, where we tor without having to give up on domain-generality. Inspired
show architecture affinity for programs along dimensions by the insights here, we demonstrated that domain-specific
of control and memory regularity. Figure 12(a) shows how accelerators rely on fundamentally common specialization
prior programmable specialization techniques only focus principles: specialization of computation, communication,

JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 89
research highlights

Figure 12. Program phase affinity by application characteristics. Memory ranges from regular and data-independent, to irregular and
data-dependent but with parallelism, to latency bound with no parallelism. Control can range from noncritical or not present, critical but
repeating, not repeating but predictable, to unpredictable and data-dependent.

Vector Higher Explicit- Higher

Regular
Regular

Access
Access

SIMD/GPU ILP Dataflow + ILP

Explicit-Dataflow (energy+perf)
(perf+energy Streaming
Benefit) (perf+energy

Memory Regularity
Memory Regularity

benefit)

Irregular
Irregular

Access
Access

Out-of-Order
Out-of-
Order
Latency

Latency
Bound
Bound

Simple Core (energy benefit) Explicit-Dataflow (energy benefit)

Unpredictable Predictable Repeating Non- Unpredictable Predictable Repeating Non-


Critical Critical
Control Regularity Control Regularity
(a) Prior Specialization Techniques (b) Enabled Specialization Techniques

concurrency, data-reuse, and coordination.14 A dataflow easier to address. Applications for which accelerators
model of computation is especially suitable for exploiting are amenable are generally well-behaved (keeping to a
the first three principles for massive parallel computation, minimum or avoiding pointers, etc.). The execution model
whereas a Von Neumann model excels at the coordination and architecture provides interfaces to cleanly expose the
of control decisions and ordering. We further addressed application’s parallelism and locality to the hardware.
programmable specialization by proposing a Von Neumann/ This opens up exciting opportunities in c­ ompiler and
dataflow architecture called stream-dataflow,13 which specifies programming languages research to target accelerators.
memory access and communication as streams, enabling
effective specialization of data-reuse in caches and scratch- 8. CONCLUSION
pad memories. This article observed a synergy between Von Neumann and
Future directions: The promise of dataflow specializa- dataflow processors due to variance in program behaviors
tion in the accelerator context is to enable freedom from at a fine grain and used this insight to build a practical pro-
application-­specific hardware development, leading to two cessor, SEED. It enables potentially disruptive performance
important future directions. and energy efficiency trade-offs for general-purpose proces-
sors, pushing the boundary of what is possible given only a
• An accelerator architecture: The high energy and area- modestly complex core. This approach of specializing for
efficiency of a Von Neumann/dataflow accelerator, cou- program behaviors using heterogeneous dataflow architec-
pled with a well-defined hardware/software interface, tures could open a new design space, ultimately reducing the
enables the almost paradoxical concept of an accelera- importance of aggressive OOO designs and lead to greater
tor architecture. We envision that a dataflow-special- opportunity for radical architecture innovation.
ized ISA such as stream-dataflow, along with essential
References
hardware specialization principles, can serve as the 1. Arvind, K., Nikhil, R.S. Executing a set customization. In MICRO 37
basis for future innovation for specialization architec- program on the MIT tagged-token Proceedings of the 37th Annual IEEE/
dataflow architecture. IEEE Trans. ACM International Symposium on
tures. Its high efficiency makes it an excellent baseline Comput. 39, 3 (1990), 300–318. Microarchitecture (Portland, Oregon,
comparison design for new accelerators, and the ease 2. Budiu, M., Artigas, P.V., Goldstein S.C. December 04–08, 2004), IEEE
Dataflow: A complement to superscalar. Computer Society, Washington, DC,
of modifying its hardware/software interface can In ISPASS '05 Proceedings of the USA, 30–40.
enable integration of novel forms of computation and IEEE International Symposium on 5. Gebhart, M., Maher, B.A., Coons, K.E.,
Performance Analysis of Systems and Diamond, J., Gratz, P., Marino, M.,
memory specialization for challenging workload Software (March 20–22, 2005) IEEE Ranganathan, N., Robatmili, B.,
domains. Computer Society, Washington, DC, Smith, A., Burrill, J., Keckler, S.W.,
USA, 177–186. Burger, D., McKinley, K.S. An
• Compilation: How a given program leverages Von 3. Burger, D., Keckler, S.W., McKinley, K.S., evaluation of the trips computer
Neumann and dataflow mechanisms can have tremen- Dahlin, M., John, L.K., Lin, C., Moore, C.R., system. In ASPLOS XIV Proceedings
Burrill, J., McDonald, R.G., Yoder, W., of the 14th International Conference
dous influence on attainable efficiency, and some meth- Team, T.T. Scaling to the end of silicon on Architectural Support for
with edge architectures. Computer 37, Programming Languages and
odology is required to navigate this design space. The 7 (July 2004), 44–55. Operating Systems (Washington,
fundamental compiler problem remains extracting 4. Clark, N., Kudlur, M., Park, H., Mahlke, S., DC, USA, March 07–11, 2009), ACM,
Flautner, K. Application-specific New York, NY, USA, 1–12.
and expressing parallelism and locality. The execution processing on a general-purpose 6. Govindaraju, V., Ho, C.-H., Nowatzki, T.,
model and application domains make these problems core via transparent instruction Chhugani, J., Satish, N., Sankaralingam, K.,

90 COMMUNICATIO NS O F TH E ACM | J U NE 201 9 | VO L . 62 | NO. 6


Kim, C. DYSER: Unifying functionality MICRO 42 Proceedings of the 42nd constraint-centric scheduling Proceedings of the 36th Annual IEEE/
and parallelism specialization for Annual IEEE/ACM International framework for spatial architectures. ACM International Symposium on
energy-efficient computing. IEEE Symposium on Microarchitecture In PLDI '13 Proceedings of the Microarchitecture (December 03–05,
Micro 32, 5 (Sept. 2012), 38–51. (New York, New York, December 34th ACM SIGPLAN Conference 2003), IEEE Computer Society,
7. Gupta, S., Feng, S., Ansari, A., Mahlke, S., 12–16, 2009), 469–480. on Programming Language Design Washington, DC, USA, 291.
August, D. Bundled execution of 12. Liu, Y., Furber S. A low power and Implementation (Seattle, 21. Venkatesh, G., Sampson, J.,
recurring traces for energy-efficient embedded dataflow coprocessor. In Washington, USA, June 16–19, 2013), Goulding, N., Garcia, S., Bryksin,
general purpose processing. In ISVLSI '05 Proceedings of the IEEE ACM, New York, NY, USA, 495–506. V., Lugo-Martinez, J., Swanson, S.,
ASPLOS XIV Proceedings of the Computer Society Annual Symposium 18. Padmanabha, S., Lukefahr, A., Taylor, M.B. Conservation cores:
14th International Conference on on VLSI: New Frontiers in VLSI Das, R., Mahlke, S.A. Trace based Reducing the energy of mature
Architectural Support for Programming Design (May 11–12, 2005), 246–247. phase prediction for tightly-coupled computations. In ASPLOS XV
Languages and Operating Systems 13. Nowatzki, T., Gangadhar, V., Ardalani, N., heterogeneous cores. In MICRO-46 Proceedings of the Fifteenth Edition
(Washington, DC, USA, March 07–11, Sankaralingam, K. Stream-dataflow Proceedings of the 46th Annual IEEE/ of ASPLOS on Architectural Support
2009), ACM, New York, NY, USA, 1–12. acceleration. In ISCA '17 Proceedings ACM International Symposium on for Programming Languages and
8. Hayenga, M., Naresh, V., Lipasti, M. of the 44th Annual International Microarchitecture (Davis, California, Operating Systems (Pittsburgh,
Revolver: Processor architecture for Symposium on Computer December 07–11, 2013), ACM, Pennsylvania, USA, March 13–17,
power efficient loop execution. In 2014 Architecture (Toronto, ON, Canada, New York, NY, USA, 445–456. 2010), ACM, New York, NY, USA,
IEEE 20th International Symposium June 24–28, 2017), 416–429. 19. Papadopoulos, G.M. Monsoon: 205–218.
on High Performance Computer 14. Nowatzki, T., Gangadhar, V., An explicit token-store architecture. 22. Watkins, M.A., Nowatzki, T., Carno,
Architecture (HPCA) (Orlando, FL, Sankaralingam, K., Wright, G. Pushing In ISCA '90 Proceedings of the 17th A. Software transparent dynamic
USA, 2014), IEEE, 591–602. the limits of accelerator efficiency Annual International Symposium binary translation for coarse-grain
9. Kocher, P., Horn, J., Fogh, A., Genkin, D., while retaining programmability. In on Computer Architecture (Seattle, reconfigurable architectures. In
Gruss, D., Haas, W., Hamburg, M., 2016 IEEE International Symposium Washington, USA, May 28–31, 1990), 2016 IEEE International Symposium
Lipp, M., Mangard, S., Prescher, T., on High Performance Computer ACM, New York, NY, USA, 82–91. on High Performance Computer
Schwarz, M., Yarom, Y. Spectre Architecture (HPCA), (March 12–16, 20. Swanson, S., Michelson, K., Schwerin, A., Architecture (HPCA) (March 12–16,
attacks: Exploiting speculative 2016), 27–39. Oskin, M. WaveScalar. In MICRO 36 2016), 138–150.
execution. In 40th IEEE Symposium 15. Nowatzki, T., Govindaraju, V.,
on Security and Privacy (S\&P'19) Sankaralingam, K. A graph-based
(IEEE Computer Society, 2019). program representation for analyzing
10. Lee, C., Potkonjak, M., Mangione- hardware specialization approaches.
Smith, W. MediaBench: A tool for Comput. Archit. Lett. 14, 2 (July-Dec Tony Nowatzki ({tjn}@cs.ucla.edu), Vinay Gangadhar and Karthikeyan
evaluating and synthesizing multimedia 2015), 94–98. University of California, Los Angeles, Sankaralingam ({vinay, karu}@cs.wisc.
and communications systems. In 16. Nowatzki, T., Sankaralingam, K. Los Angeles, CA, USA. edu), University of Wisconsin - Madison,
MICRO 30 Proceedings of the 30th Analyzing behavior specialized Madison, WI, USA.
Annual ACM/IEEE International acceleration. In ASPLOS '16 Proceedings Some work performed at University
Symposium on Microarchitecture of the Twenty-First International of Wisconsin - Madison.
(Research Triangle Park, North Conference on Architectural Support
Carolina, USA, December 01–03, for Programming Languages and
1997), 330–335. Operating Systems (Atlanta, Georgia,
11. Li, S., Ahn, J.H., Strong, R.D., Brockman, J.B., USA, April 02–06, 2016), ACM,
Tullsen, D.M., Jouppi, N.P. McPAT: An New York, NY, USA, 697–711.
integrated power, area, and timing 17. Nowatzki, T., Sartin-Tarm, M.,
modeling framework for multicore De Carli, L., Sankaralingam, K.,
and manycore architectures. In Estan, C., Robatmili, B. A general © 2019 ACM 0001-0782/19/6 $15.00

JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 91
CAREERS

University of Central Missouri Required Qualifications: Dr. Songlin Tian, Search Committee Chair
Assistant Professor in Computer Science - ˲˲ Ph.D. in Computer Science, Cybersecurity or School of Computer Science and Mathematics
Multiple Positions Software Engineering University of Central Missouri
Warrensburg, MO 64093
The School of Computer Science and Mathemat- ˲˲ Demonstrated ability to teach existing courses (660) 543-4930
ics at the University of Central Missouri is ac- at the undergraduate and/or graduate levels tian@ucmo.edu
cepting applications for three non tenure-track
positions in Computer Science at the rank of ˲˲ Excellent verbal and written communication Initial screening of applications begins May
Assistant Professor. The appointment will begin skills 1, 2019, and continues until position is filled. AA/
August 2019. We are looking for faculty excited by EEO/ADA. Women and minorities are encouraged
the prospect of shaping our school’s future and The Application Process: To apply online, go to apply.
contributing to its sustained excellence. to https://jobs.ucmo.edu. Apply to position UCM is located in Warrensburg, MO, which is
#997335, #997819 or #998560. The following 35 miles southeast of the Kansas City metropolitan
The Position: Duties will include teaching under- items should be attached: a letter of interest, a area. It is a public comprehensive university with
graduate and graduate courses in computer science, curriculum vitae, copies of transcripts, and a list about 13,000 students. The School of Computer
cybersecurity and/or software engineering, and of at least three professional references including Science and Mathematics offers undergraduate
developing new courses depending upon the their names, addresses, telephone numbers and and graduate programs in Computer Science,
expertise of the applicant and school needs, pro- email addresses. Official transcripts and three Cybersecurity and Software Engineering with over
gram accreditation and assessment. Faculty are ex- letters of recommendation will be requested for 700 students. The undergraduate Computer Science
pected to assist with school and university commit- candidates invited for on-campus interview. For and Cybersecurity programs are accredited by the
tee work and service activities, and advising majors. more information, contact: Computing Accreditation Commission of ABET.

Superintendent
Information Technology Division
www.nrl.navy.mil

Senior Executive Service Career Opportunity – Tier 2 ADVERTISING


ES-0180/0854/0855/1310/1520/1550: $126,148 - $189,600 per annum* (2019 salary)
*Actual salary may vary depending on the scope and complexity of the IN CAREER
position and the qualifications and current compensation of the selectee. OPPORTUNITIES
Become a member of an elite research and development community involved in basic and applied
scientific research and advanced technological development for tomorrow’s Navy and for the Nation. How to Submit a Classified Line Ad: Send
The Superintendent of the Information Technology Division is responsible for the conception, planning an e-mail to acmmediasales@acm.org.
and formulation of the scientific program of the Division in pursuance of the military defense of the United Please include text, and indicate the
States. He/she provides the Technical and administrative leadership required to insure that significant issue/or issues where the ad will appear,
and productive accomplishments flow from that program. and a contact name and number.
As the Superintendent, you will: Estimates: An insertion order will then
• Establish priorities by considering the relative importance of the work to other work carried on be e-mailed back to you. The ad will by
by the Division, its utility and adaptability to the national defense, its usefulness in other fields typeset according to CACM guidelines.
of research and development, and its potential value in relation to the increase of scientific NO PROOFS can be sent. Classified line
knowledge.
ads are NOT commissionable.
• Assigns problems to the operating Branches, approves their general plans, stimulates interest and
activity on the part of Division personnel, reviews progress and completed work for soundness of Deadlines: 20th of the month/2 months
approach, validity of conclusions and advisability of recommendations. prior to issue date. For latest deadline
• Coordinates the research and development of the Division with that performed in other Divisions at info, please contact:
the NRL, in other US and allied Government laboratories, universities and industrial laboratories. acmmediasales@acm.org
• Is a recognized authority in the field of computer science, communications, cyber security,
artificial intelligence and allied disciplines. Career Opportunities Online: Classified
• Responsible for planning and directing the effective administration of the Division which includes and recruitment display ads receive a
administering budget matters that fall within the province of the Divisions operations. free duplicate listing on our website at:
Applicants should be recognized as national/international authorities and should have planned and http://jobs.acm.org
executed difficult programs of national significance or specialized programs that show outstanding Ads are listed for a period of 30 days.
attainment in their field of research.
For more information and specific instructions on how to apply, visit www.usajobs.gov, log in and enter the For More Information Contact:
following announcement number: DE-10493155-19-JS. The announcement closes 28 June 2019. Contact ACM Media Sales
Lesley Renfro at Lesley.renfro@nrl.navy.mil for more information. E-mailed resumes cannot be accepted. at 212-626-0686 or
NRL is an Equal Opportunity Employer acmmediasales@acm.org
NRL – 4555 Overlook Ave SW, Washington DC 20375

92 COMM UNICATIO NS O F THE ACM | J U NE 201 9 | VO L . 62 | NO. 6


The Hasso Plattner Institute (HPI) is Germany’s
university excellence center for digital engineering.
The Faculty of Digital Engineering, established jointly
by HPI and the University of Potsdam, offers an
Finnish Center for Artificial especially practical and engineering-oriented study
Intelligence FCAI has launched – program in computer science that is unique through-
join us in creating the next out Germany.
generation of AI!
Are you a promising young researcher looking for
an outstanding postdoc position? Annually, HPI’s Research Schools grant
Or an experienced professor wishing to spend a
sabbatical in one of the most dynamic AI hubs in
18 Ph.D. and Postdoctoral Scholarships.
Europe?
With their interdisciplinary and international structure, HPI’s two Research Schools on
Or perhaps you are a superb software developer “Service-oriented Systems Engeneering” and “Data Science” interconnect HPI’s research
who wishes dive into the world of fundamental AI groups as well as its branches at the University of Cape Town, Technion, and Nanjing
research? University.
We are launching the Finnish Center for Artificial HPI RESEARCH GROUPS
Intelligence FCAI (https://fcai.fi ) – a center striving Algorithm Engineering, Prof. Dr. Tobias Friedrich
for scientific breakthroughs in the field of AI while Business Process Technology, Prof. Dr. Mathias Weske
producing high-quality societal and economic Computer Graphics Systems, Prof. Dr. Jürgen Döllner
impact. FCAI is built on the a long tradition and Data Engineering Systems, Prof. Dr. Tilmann Rabl
track record of decades of pioneering machine
Digital Health, Prof. Dr. Erwin Böttinger, Prof. Dr. Bert Arnrich, Prof. Dr. Christoph Lippert
learning research in Helsinki. Come and join
Enterprise Platform and Integration Concepts, Prof. Dr. h.c. Hasso Plattner
our vibrant community of leading scientists
Human Computer Interaction, Prof. Dr. Patrick Baudisch
and companies in creating the next generation
Information Systems, Prof. Dr. Felix Naumann
of artificial intelligence that is data-efficient,
Internet Technologies and Systems, Prof. Dr. Christoph Meinel
trustworthy and understandable!
Operating Systems and Middleware, Prof. Dr. Andreas Polze
With a total budget of €250M over the next 8 years, Software Architecture, Prof. Dr. Robert Hirschfeld
FCAI is opening a range of research positions for System Analysis and Modeling, Prof. Dr. Holger Giese
academics and ICT professionals in different levels
Applications must be submitted by August 15 of the respective year.
of their careers. Read more on what FCAI can offer
For more information on HPI’s Research Schools please visit:
to you: https://fcai.fi/open-positions.
www.hpi.de/research-school

JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 93
last byte

[ C ONT I N U E D FRO M P. 9 6] keep the people are basically massive analogy- some of the most innovative ideas will
connection alive with people who are try- making machines. They develop these come from people much younger than us.
ing to understand how the brain works. representations quite slowly, and then
HINTON: That said, neuroscientists the representations they develop deter- The progress in the field has been amaz-
are now taking it seriously. For many mine the kinds of analogies they can ing. What would you have been surprised
years, neuroscientists said, “artificial make. Of course, we can do reasoning, to learn was possible 20 or 30 years ago?
neural networks are so unlike the real and we wouldn’t have mathematics LECUN: There’s so much I’ve been
brain, and they’re not going to tell us without it, but it’s not the fundamental surprised by. I was surprised by how
anything about how the brain works.” way we think. late the deep learning revolution was,
Now, neuroscientists are taking seri- but also by how fast it developed once
ously the possibility that something For pioneering researchers, you seem un- it started. I would have expected things
like backpropagation is going on in the usually unwilling to rest on your laurels. to happen more progressively, but peo-
brain, and that’s a very exciting area. HINTON: I think there’s something ple abandoned the whole idea of neu-
LECUN: Almost all the studies now of special about people who invented tech- ral nets between the mid-1990s and
human and animal vision use convolu- niques that are now standard. There mid-2000s. We had evidence that they
tional nets as the standard conceptual was nothing God-given about them, and were working before, but then, once
model. That wasn’t the case until rela- there could well be other techniques that the demonstrations became incon-
tively recently. are better. Whereas people who come to trovertible, the revolution happened
HINTON: I think it’s also going to have a field when there’s already a standard really fast, first in speech recognition,
a huge impact, slowly, on the social sci- way of doing things don’t understand then in image recognition, and now in
ences, because it’s going to change our quite how arbitrary that standard way is. natural language understanding.
PHOTO BY A LEXA NDER BERG

view of what people are. We used to think BENGIO: Students sometimes talk HINTON: I would have been amazed, 20
of people as rational beings, and what about neural nets as if they were de- years ago, if someone had said that you
was special about people was that they scribing the Bible. could take a sentence in one language,
used reasoning to derive conclusions. LECUN: It creates a generation of dog- carve it up into little word fragments,
Now we understand much better that matism. Nevertheless, it’s very likely that feed it into a neural net that starts with

94 COMM UNICATIO NS O F THE ACM | J U NE 201 9 | VO L . 62 | NO. 6


last byte

making this technology more acces-


sible to the developing world.
BENGIO: I think it’s very important. I
used to not think much about politics,
but machine learning and AI have come
out of universities, and I think we have a
responsibility to think about that and to
participate in social and political discus-
sions about how they should be used.
One issue, among many, is where is the
know-how and wealth and technology
are going to be concentrated. Are they
going to be concentrated in the hands of
a few countries, a few companies, and a
small class of people, or can we find ways
to make them more accessible, especial-
ly in countries where they could make a
bigger difference for more people?
HINTON: Google has open-sourced its
main software for developing neural
nets, which is called TensorFlow, and you
can also use the special Google hardware
for neural nets on the cloud. So Google is
trying to make this technology accessible
to as wide a set of people as possible.
LECUN: I think that’s a very important
point. The deep learning community has
been very good at promoting the idea of
open research, not just within academia,
where conferences distribute papers, re-
views, and commentaries in the open,
but also in the corporate world, where
companies like Google and FB are open-
sourcing the vast majority of the software
random connections, and train the tive adversarial networks—that you can that they write and providing the tools
neural net to produce a translation of basically use neural nets as generative for other people to build on top of it. So
the sentence into another language models to produce images and sound. anyone can reproduce anyone else’s re-
with no knowledge at all of syntax or se- BENGIO: When I was doing my Ph.D., search, sometimes within days. No top
mantics—just no linguistic knowledge I was struggling to expand the idea that research group is ahead of any other by
whatsoever—and it would translate bet- neural nets could do more than just pat- more than a couple of months on any
ter than anything else. It’s not perfect, tern recognition—taking a fixed-size vec- particular topic. The important question
it’s not as good as a bilingual speaker, tor as input and producing categories. is how fast the field as a whole is progress-
but it’s getting close. But it’s only recently with our translation ing. Because the things we really want to
LECUN: It’s also amazing how quickly work that we escaped this template. As build—virtual assistants that can answer
these techniques became so useful for so Yann said, the ability to generate new any question we ask them and can help
many industries. If you take deep learn- things has really been revolutionary. So us in our daily lives—we just don’t just
ing out of Google or Facebook today, has the ability to manipulate any kind lack the technology, we lack the basic sci-
both companies crumble; they are com- of data structure, not just pixels and entific principles for it. The faster we can
pletely built around it. One thing that vectors. Traditionally, neural nets were foster the entire research community to
surprised me when I joined Facebook is limited to tasks that humans can do very work on this, the better it is for all of us.
that there was a small group using con- quickly and unconsciously, like recog-
volutional nets for face recognition. My nizing objects and images. Modern neu- Leah Hoffmann is a technology writer based in Piermont,
NY, USA.
first instinct about convolutional nets ral nets are different in nature from what
was to think they would be useful for, we were thinking about in the 1980s, and © 2019 ACM 0001-0782/19/6 $15.00
maybe, category-level recognition: car, they can do things that are much closer
dog, cat, airplane, table, not fine-grained to what we do when we reason, what we
things like faces. But it turned out to do when we program computers. Watch the recipients discuss
work very well, and it’s completely stan- this work in the exclusive
Communications video.
dard now. Another thing that surprised In spite of all the progress, Yoshua, https://cacm.acm.org/
me came out of Yoshua’s lab on genera- you’ve talked about the urgency of videos/2018-acm-turing-award

JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 95
last byte

DOI:10.1145/3324011 Leah Hoffmann

Q&A
Reaching New Heights
with Artificial Neural Networks
ACM A.M. Turing Award recipients Yoshua Bengio, Geoffrey Hinton, and Yann LeCun
on the promise of neural networks, the need for new paradigms, and the concept of making
technology accessible to all.

ONCE TREATED BY the field with skepticism ble?” In the old days, people in AI made searchers, and we are always impatient
(if not outright derision), the artificial grand claims, and they sometimes for more, because we are far from hu-
neural networks that 2018 ACM A.M. turned out to be just a bubble. But neu- man-level AI, and the dream of under-
Turing Award recipients Geoffrey Hin- ral nets go way beyond promises. The standing the principles of intelligence,
ton, Yann LeCun, and Yoshua Bengio technology actually works. Further- natural or artificial.
spent their careers developing are today more, it scales. It automatically gets
an integral component of everything better when you give it more data and a What isn’t discussed enough?
from search to content filtering. So what faster computer, without anybody hav- HINTON: What does this tell us about
of the now-red-hot field of deep learning ing to write more lines of code. how the brain works? People ask that,
and artificial intelligence (AI)? Here, the YANN LECUN: That’s true. The basic but not enough people are asking that.
three researchers share what they find idea of deep learning is not going away, BENGIO: It’s true. Unfortunately, al-
exciting, and which challenges remain. but it’s still frustrating when people ask though deep learning takes inspiration
if all we need to do to make machines from the brain and from cognition, many
There’s so much more noise now about more intelligent is simply scale our cur- engineers involved with it these days
artificial intelligence than there was rent methods. We need new paradigms. don’t care about those topics. It makes
when you began your careers—some YOSHUA BENGIO: The current tech- sense, because if you’re applying things
of it well-informed, some not. What do niques have many years of industrial in industry, it doesn’t matter. But in
you wish people would stop asking you? and scientific application ahead of terms of research, I think it’s a big loss if
GEOFFREY HINTON: “Is this just a bub- them. That said, the three of us are re- we don’t [C O NTINUED O N P. 94]

PHOTOS BY ALEXA ND ER BERG

Geoffrey Hinton Yoshua Bengio Yann LeCun

96 COMM UNICATIO NS O F THE ACM | J U NE 201 9 | VO L . 62 | NO. 6


DISCOVER THE
MOST BRILLIANT
RESEARCH

Register today at s2019.siggraph.org/register


CS education starts here
A twenty-first century
model for the dissemination
of knowledge

An award-winning comprehensive
new textbook, accompanied by
video lectures and extensive
online content

From Robert Sedgewick and Kevin Wayne,


authors of the bestselling text Algorithms

Two of Princeton’s
most popular courses,
now reaching millions
worldwide

introcs.princeton.edu
algs4.princeton.edu

You might also like