Professional Documents
Culture Documents
ACM
CACM.ACM.ORG OF THE 06/2019 VOL.62 NO.06
7 Cerf’s Up
Back to the Future
By Vinton G. Cerf
8 BLOG@CACM
Is CS Really for All, and Defending
Democracy in Cyberspace
Mark Guzdial mulls the difficulty
of getting into a computer science
class, while John Arquilla ponders 16 25
political warfare in cyberspace.
10 Neural Net Worth 22 Global Computing
27 Calendar Yoshua Bengio, Geoffrey Hinton, Global Data Justice
and Yann LeCun this month A new research challenge
92 Careers will receive the 2018 ACM for computer science.
A.M. Turing Award for conceptual By Linnet Taylor
and engineering breakthroughs
Last Byte that have made deep neural 25 Inside Risks
networks a critical component Through Computer
96 Q&A of computing. Architecture, Darkly
Reaching New Heights By Neil Savage Total-system hardware and
with Artificial Neural Networks microarchitectural issues
ACM A.M. Turing Award recipients 13 Lifelong Learning are becoming increasingly critical.
Yoshua Bengio, Geoffrey Hinton, in Artificial Neural Networks By A.T. Markettos, R.N.M. Watson,
and Yann LeCun on the promise New methods enable systems to S.W. Moore, P. Sewell, and P.G. Neumann
of neural networks, the need for rapidly, continuously adapt.
new paradigms, and the concept of By Gary Anthes 28 The Profession of IT
making technology accessible to all. An Interview with
By Leah Hoffmann 16 And Then, There Were Three David Brin on Resiliency
How long can the silicon foundry Many risks of catastrophic failures IMAGES BY: ( L) M ACRO PH OTO; ( R) A NDRIJ BORYS ASSOCIAT ES, USING SH UT TERSTOCK
sector continue to adapt, of critical infrastructures can be
Watch the recipients discuss
this work in the exclusive as physical limits make further significantly reduced by
Communications video. shrinkage virtually impossible? relatively simple measures
https://cacm.acm.org/
videos/2018-acm-turing- By Don Monroe to increase resiliency.
award By Peter J. Denning
19 Ethics in Technology Jobs
Employees are increasingly 32 Viewpoint
challenging technology companies Personal Data and
on their ethical choices. the Internet of Things
By Keith Kirkpatrick It is time to care about
digital provenance.
By Thomas Pasquier, David Eyers,
and Jean Bacon
42 54 70
36 Garbage Collection as a Joint Venture 54 Programmable Solid-State Storage 70 The Challenge of Crafting
A collaborative approach in Future Cloud Datacenters Intelligible Intelligence
to reclaiming memory in Programmable software-defined To trust the behavior of complex
heterogeneous software systems. solid-state drives can move AI algorithms, especially in
By Ulan Degenbaev, Michael Lippautz, computing functions closer mission-critical settings,
and Hannes Payer to storage. they must be made intelligible.
By Jaeyoung Do, Sudipta Sengupta, By Daniel S. Weld and Gagan Bansal
42 How to Create a Great Team Culture and Steven Swanson
(and Why It Matters) Watch the authors discuss
this work in the exclusive
Build safety, share vulnerability, 63 Engineering Trustworthy Systems: Communications video.
and establish purpose. A Principled Approach https://cacm.acm.org/
videos/the-challenge-
By Kate Matsudaira to Cybersecurity of-crafting-intelligible-
Cybersecurity design reduces intelligence
Communications of the ACM is the leading monthly print and online magazine for the computing and information technology fields.
Communications is recognized as the most trusted and knowledgeable source of industry information for today’s computing professional.
Communications brings its readership in-depth coverage of emerging areas of computer science, new trends in information technology,
and practical applications. Industry leaders use Communications as a platform to present and debate various technology implications,
public policies, engineering challenges, and market trends. The prestige and unmatched reputation that Communications of the ACM
enjoys today is built upon a 50-year commitment to high-quality editorial content and a steadfast dedication to advancing the arts,
sciences, and applications of information technology.
ACM, the world’s largest educational STA F F EDITORIAL BOARD ACM Copyright Notice
and scientific computing society, delivers DIRECTOR OF PU BL ICATIONS E DITOR- IN- C HIE F Copyright © 2019 by Association for
resources that advance computing as a Scott E. Delman Andrew A. Chien Computing Machinery, Inc. (ACM).
science and profession. ACM provides the cacm-publisher@cacm.acm.org eic@cacm.acm.org Permission to make digital or hard copies
computing field’s premier Digital Library Deputy to the Editor-in-Chief of part or all of this work for personal
and serves its members and the computing Executive Editor Lihan Chen or classroom use is granted without
profession with leading-edge publications, Diane Crawford cacm.deputy.to.eic@gmail.com fee provided that copies are not made
conferences, and career resources. Managing Editor S E NIOR E DITOR or distributed for profit or commercial
Thomas E. Lambert Moshe Y. Vardi advantage and that copies bear this
Executive Director and CEO Senior Editor notice and full citation on the first
Vicki L. Hanson Andrew Rosenbloom NE W S page. Copyright for components of this
Deputy Executive Director and COO Senior Editor/News Co-Chairs work owned by others than ACM must
Patricia Ryan Lawrence M. Fisher Marc Snir and Alain Chesnais be honored. Abstracting with credit is
Director, Office of Information Systems Web Editor Board Members permitted. To copy otherwise, to republish,
Wayne Graves David Roman Monica Divitini; Mei Kobayashi; to post on servers, or to redistribute to
Director, Office of Financial Services Editorial Assistant Rajeev Rastogi; François Sillion lists, requires prior specific permission
Darren Ramdin Danbi Yu and/or fee. Request permission to publish
Director, Office of SIG Services VIE W P OINTS from permissions@hq.acm.org or fax
Donna Cappo Art Director Co-Chairs (212) 869-0481.
Director, Office of Publications Andrij Borys Tim Finin; Susanne E. Hambrusch;
Scott E. Delman Associate Art Director John Leslie King; Paul Rosenbloom For other copying of articles that carry a
Margaret Gray Board Members code at the bottom of the first or last page
Assistant Art Director Michael L. Best; Judith Bishop; or screen display, copying is permitted
ACM CO U N C I L
Mia Angelica Balaquiot James Grimmelmann; Mark Guzdial; provided that the per-copy fee indicated
President
Production Manager Haym B. Hirsch; Richard Ladner; in the code is paid through the Copyright
Cherri M. Pancake
Bernadette Shade Carl Landwehr; Beng Chin Ooi; Clearance Center; www.copyright.com.
Vice-President
Intellectual Property Rights Coordinator Francesca Rossi; Len Shustek; Loren Terveen;
Elizabeth Churchill
Barbara Ryan Marshall Van Alstyne; Jeannette Wing; Subscriptions
Secretary/Treasurer
Advertising Sales Account Manager Susan J. Winter An annual subscription cost is included
Yannis Ioannidis
Ilia Rodriguez in ACM member dues of $99 ($40 of
Past President
Alexander L. Wolf P R AC TIC E which is allocated to a subscription to
Chair, SGB Board Columnists Co-Chairs Communications); for students, cost
Jeff Jortner David Anderson; Michael Cusumano; Stephen Bourne and Theo Schlossnagle is included in $42 dues ($20 of which
Co-Chairs, Publications Board Peter J. Denning; Mark Guzdial; Board Members is allocated to a Communications
Jack Davidson and Joseph Konstan Thomas Haigh; Leah Hoffmann; Mari Sako; Eric Allman; Samy Bahra; Peter Bailis; subscription). A nonmember annual
Members-at-Large Pamela Samuelson; Marshall Van Alstyne Betsy Beyer; Terry Coatta; Stuart Feldman; subscription is $269.
Gabriele Anderst-Kotis; Susan Dumais; Nicole Forsgren; Camille Fournier;
Renée McCauley; Claudia Bauzer Mederios; C O N TAC T P O IN TS Jessie Frazelle; Benjamin Fried; Tom Killalea; ACM Media Advertising Policy
Elizabeth D. Mynatt; Pamela Samuelson; Copyright permission Tom Limoncelli; Kate Matsudaira; Communications of the ACM and other
Theo Schlossnagle; Eugene H. Spafford permissions@hq.acm.org Marshall Kirk McKusick; Erik Meijer; ACM Media publications accept advertising
SGB Council Representatives Calendar items George Neville-Neil; Jim Waldo; in both print and electronic formats. All
Sarita Adve and Jeanna Neefe Matthews calendar@cacm.acm.org Meredith Whittaker advertising in ACM Media publications is
Change of address at the discretion of ACM and is intended
BOARD C HA I R S acmhelp@acm.org C ONTR IB U TE D A RTIC LES to provide financial support for the various
Letters to the Editor Co-Chairs activities and services for ACM members.
Education Board
letters@cacm.acm.org James Larus and Gail Murphy Current advertising rates can be found
Mehran Sahami and Jane Chu Prey
Board Members by visiting http://www.acm-media.org or
Practitioners Board
W E B S IT E William Aiello; Robert Austin; Kim Bruce; by contacting ACM Media Sales at
Terry Coatta
http://cacm.acm.org Alan Bundy; Peter Buneman; Jeff Chase; (212) 626-0686.
Andrew W. Cross; Yannis Ioannidis;
REGIONA L C O U N C I L C HA I R S WEB BOARD Single Copies
Gal A. Kaminka; Ben C. Lee; Igor Markov;
ACM Europe Council Chair Single copies of Communications of the
Lionel M. Ni; Adrian Perrig; Doina Precup;
Chris Hankin James Landay ACM are available for purchase. Please
Marie-Christine Rousset; Krishan Sabnani;
ACM India Council Board Members contact acmhelp@acm.org.
m.c. schraefel; Ron Shamir; Alex Smola;
Abhiram Ranade Marti Hearst; Jason I. Hong; Sebastian Uchitel; Hannes Werthner;
ACM China Council Jeff Johnson; Wendy E. MacKay COMMUN ICATION S OF THE ACM
Reinhard Wilhelm
Wenguang Chen (ISSN 0001-0782) is published monthly
AU T H O R G U ID E L IN ES by ACM Media, 2 Penn Plaza, Suite 701,
RES E A R C H HIGHLIGHTS
PUB LICATI O N S BOA R D http://cacm.acm.org/about- New York, NY 10121-0701. Periodicals
Co-Chairs
Co-Chairs communications/author-center postage paid at New York, NY 10001,
Azer Bestavros and Shriram Krishnamurthi
Jack Davidson and Joseph Konstan Board Members and other mailing offices.
Board Members ACM ADVERTISIN G DEPARTM E NT Martin Abadi; Amr El Abbadi;
Phoebe Ayers; Edward A. Fox; Chris Hankin; 2 Penn Plaza, Suite 701, New York, NY Animashree Anandkumar; Sanjeev Arora; POSTMASTER
Xiang-Yang Li; Nenad Medvidovic; 10121-0701 Michael Backes; Maria-Florina Balcan; Please send address changes to
Tulika Mitra; Sue Moon; Michael L. Nelson; T (212) 626-0686 David Brooks; Stuart K. Card; Jon Crowcroft; Communications of the ACM
Sharon Oviatt; Eugene H. Spafford; F (212) 869-0481 Alexei Efros; Bryan Ford; Alon Halevy; 2 Penn Plaza, Suite 701
Stephen N. Spencer; Divesh Srivastava; Gernot Heiser; Takeo Igarashi; Sven Koenig; New York, NY 10121-0701 USA
Robert Walker; Julie R. Williamson Advertising Sales Account Manager Greg Morrisett; Tim Roughgarden;
Ilia Rodriguez Guy Steele, Jr.; Robert Williamson;
ACM U.S. Technology Policy Office ilia.rodriguez@hq.acm.org Printed in the USA.
Margaret H. Wright; Nicholai Zeldovich;
Adam Eisgrau,
Andreas Zeller
Director of Global Policy and Public Affairs Media Kit acmmediasales@acm.org
1701 Pennsylvania Ave NW, Suite 200,
Washington, DC 20006 USA S P EC IA L S EC TIONS
T (202) 580-6555; acmpo@acm.org Association for Computing Machinery Co-Chairs
(ACM) Sriram Rajamani, Jakob Rehof,
Computer Science Teachers Association 2 Penn Plaza, Suite 701 and Haibo Chen A
SE
REC
Y
CL
PL
Executive Director T (212) 869-7440; F (212) 869-0481 Tao Xie; Kenjiro Taura; David Padua
NE
TH
S
I
Z
I
M AGA
I
N T H I S I S S U E of Communications, have had a demonstrable effect on com- worldwide for courses in introductory
as evidenced by the cover and puting practice. Pevzner pioneered algo- computer science. Chris Stephenson is
lead article, we celebrate the lat- rithms for rapidly sequencing DNA; his receiving the Outstanding Contribution
est recipients of the ACM A.M. algorithms underlie almost all sequence to ACM Award for her landmark work in
Turing Award. Yoshua Bengio, assemblers used today and were used to bringing K–12 teachers worldwide the
Yann LeCun, and Geoffrey Hinton car- reconstruct the vast majority of genomic tools and resources needed to introduce
ried out pioneering work in deep learn- sequences available in databases. The computer science to future generations.
ing that has touched all our lives. As Tur- ACM Grace Murray Hopper Award hon- The recipient of the ACM Eugene L.
ing Laureates, they now join the eminent ors a computing professional who has Lawler Award for Humanitarian Contri-
group of technology visionaries recog- made a major technical or service contri- butions within Computer Science and
nized with the world’s highest distinc- bution by the age of 35. This year, two in- Informatics is Meenakshi Balakrishnan
tion in computing. dividuals are being recognized: Michael for developing cost-effective solutions to
The Turing Award is one of a suite of J. Freedman for the design and deploy- address the special mobility and educa-
professional honors ACM bestows annu- ment of self-organizing peer-to-peer sys- tion challenges of the visually impaired
ally to recognize technical achievements tems; and Constantinos Daskalakis for in developing countries. The ACM-AAAI
that have made significant contribu- his contributions to complexity and Allen Newell Award, presented to an in-
tions to our field. This month, I will have game theory. dividual for career contributions that
the pleasure of joining the awardees, Gerald C. Combs is being recognized have breadth within CS or that bridge CS
ACM Fellows, and other luminaries in with the ACM Software System Award, and other disciplines, has been awarded
San Francisco for the ACM Awards Ban- given to an institution or individual(s) to Henry Kautz for his work at the inter-
quet. The annual event pays tribute to for developing a software system of last- section of AI, computational social sci-
computing excellence and to those ing influence. He created the WireShark ence, and public health.
whose contributions and innovations network protocol analyzer, used by prac- Last but not least, the Awards Ban-
have had a lasting impact on our field. titioners and researchers worldwide to quet will celebrate 56 incoming ACM
Among the new honorees is Shwetak analyze and troubleshoot a wide range of Fellows. A complete list of names and
Patel, winner of the ACM Prize in Com- network protocols. The 2019–2020 ACM their key achievements can be found at
puting. This award recognizes individu- Athena Lecturer Award, a biennial honor https://awards.acm.org/fellows.
als who have made significant contribu- celebrating fundamental CS contribu- The prestige of ACM’s awards brings
tions during the early years of their tions by women researchers, goes to Eli- global attention to outstanding techni-
careers. Patel is being honored for his sa Bertino in recognition of her ground- cal and professional achievements
innovative work in applying sensor sys- breaking work in data security and throughout the computing community.
tems to problems of sustainability and privacy. Chelsea Finn from UC Berkeley We all benefit when fine work and last-
health care. Also on hand will be Men- receives the ACM Doctoral Dissertation ing accomplishments in computer sci-
del Rosenblum, being honored as the Award for her work on “Learning to ence are celebrated. I hope you will par-
first winner of the ACM Charles P. Learn with Gradients.” ticipate this coming year, by making sure
“Chuck” Thacker Breakthrough in The ACM Distinguished Service the key achievers in your own area are
Computing Award. This new biennial Award, which celebrates service contri- nominated. Our award committees, led
award recognizes individuals whose butions to the computing community, by Awards Co-Chairs John White and
work exemplifies “out-of-the-box” goes to Paramir (Victor) Bahl, for his work Vinton Cerf, do an outstanding job, but
thinking. Rosenblum’s work echoes founding conferences, publications, they rely on people like you to identify
Thacker’s trademark can-do approach: and a SIG for researchers and practitio- and put forward strong candidates.
he reinvented the virtual machine con- ners in the mobile and wireless network- Learn more at https://awards.acm.org/
cept, thereby revolutionizing datacen- ing community, as well as contributions award-nominations.
ters and making today’s cloud comput- to technology policy. Robert Sedgewick
ing possible. is being honored with the ACM Karl V. Cherri M. Pancake is President of ACM, professor emeritus
of electrical engineering and computer science, and director
Pavel Pevzner receives the ACM Paris Karlstrom Outstanding Educator Award of a research center at Oregon State University, Corvallis,
Kanellakis Theory and Practice Award, for the outstanding textbooks and on- OR, USA.
recognizing theoretical advances that line materials he created, which are used Copyright held by author/owner..
communication.
Commun. ACM 62, 3 (Mar. 2019), 78–87.
to the next hop.a There is a wonderful
Vinton G. Cerf is vice president and Chief Internet Evangelist
a This was sometimes called “torn tape” tele- at Google. He served as ACM president from 2012–2014.
communication because you would tear the
tape off the receiving teletype. Copyright held by author.
DOI:10.1145/3323684 http://cacm.acm.org/blogs/blog-cacm
and Defending
computer science. Eric writes, at https://
stanford.io/2ODJ4OK:
The imposition of GPA thresholds and
age of those students decide they want control of the spigot by traditional post- ventions. It is well past time to return
to take post-secondary computer sci- secondary arrangements is part of the to this important idea.
ence classes? What if we get past 50%? problem now, and also later if the “demand” The other way for democracies to take
I don’t have a prediction for what decreases for whatever reasons. Having the sting out of political warfare waged
happens next. I don’t know if we’ve ever excess capacity on hand, and some way to from cyberspace is to clean up their own
had this kind of tension in American redirect it, is not the kind of resiliency we practices, which in too many countries
education. On the one hand, we have a afford educational institutions. have descended into outrageous spirals
well-funded, industry-supported effort —Dennis Hamilton of distortion and lying. What foreign ac-
to get CS into every primary and second- tors are doing pales next to what is being
ary school in the U.S. (https://code.org/ John Arquilla done by the very political parties and citi-
about/donors). Some of those kids are In (Virtual) Defense zens of democratic nations now crying
going to want more CS in college or uni- of Democracy “foul” because some other is in the game.
versity. On the other hand, we see post- March 19, 2019 The world should look to America’s Ron-
secondary schools putting the brakes http://bit.ly/2U9mtj6 ald Reagan, who back in the 1980s waged
on rising enrollment. Community col- In February, The New some of the cleanest political campaigns
leges and non-traditional post-second- York Times reported that disruptive cy- in memory. It will not be easy to stop in-
ary education may take up some of the ber operations were launched against dividuals from becoming bad political
demand, but they probably can’t grow the Russia-based Internet Research actors in cyberspace, but the major polit-
exponentially either. Like the 1980s, CS Agency during the 2018 elections in the ical parties should set an example—and
departments have no more resources U.S. These operations took two forms: an implied moral norm—by rising to the
to manage growing enrollment—but direct action causing brief shutdowns, challenge of focusing on fact- and issue-
there is even more pressure than in the and messages to suspected malefac- based election campaigns.
1980s to increase capacity. tors that sought to deter. The intended One last thought: the U.S. has to be
The greatest loss in the growing de- goal of these actions was to “protect careful about condemning others for
mand for CS classes is not that there American democracy.” engaging in interventions into its po-
will be a narrower path for K–12 stu- Neither form of action will prove ef- litical processes. As Dov Levin pointed
dents to become professional software fective over time. Election propaganda- out in a study conducted while he was a
developers. As the Generation CS re- by-troll can come from myriad sources postdoctoral fellow at Carnegie Mellon,
port (http://bit.ly/2Udzecn) showed, a and surrogates, easily outflanking clum- from 1946–2000 the U.S. intervened in
big chunk of the demand for seats in CS sy efforts to establish some sort of “infor- 81 foreign elections. The number for
courses is coming from CS minors and mation blockade.” As to deterrence, this Russia over the same period was 36.
from non-CS majors. More and more is an old chestnut of the age of nation- Some have defended American actions
people are discovering that computer states. Hacker networks will almost by saying that it is okay to intervene
science is useful, in whatever career surely not be intimidated, whether they when your goal is to shore up liberal
they pursue. Those are the people who are working on their own or at the be- forces against authoritarians. But this
are losing out on seats. Maybe they first hest of a malign third party. Indeed, in kind of reasoning can be used by those
saw programming in K–12 and now the future, election hackers are far more who attempted to influence the 2016
want some more. That’s the biggest likely to ramp up efforts to shape elec- presidential election in the U.S.; they
cost of the capacity crisis. In the long toral discourses and outcomes—in de- can say that by “outing” the Democratic
run, increasing computational literacy mocracies everywhere. Party’s backroom efforts to undermine
and sophistication across society could How, then, can this threat be ap- Senator Bernie Sanders’ campaign,
have even bigger impact than produc- propriately countered? There are they were serving the true foundation
ing more professional programmers. two ways—to date, neither of which of democracy: free and fair processes.
Inability to meet the demand for has been chosen. The first has to do Political discourse in cyberspace is a
seats in CS classes may limit the growth with seeking, via the United Nations, fact of life now, and it will remain so for
in our computing labor force. It may an “international code of conduct” the foreseeable future in democratic na-
also limit the growth of computational (ICC) in cyberspace that would impose tions. There are two ways to proceed, if
scientists, engineers, journalists, and behavior-based constraints on both the trolls are to be tamed. One involves
teachers—in short, a computationally infrastructure attacks and “political multilateral action via the United Na-
literate society. warfare.” Ironically, it is the Russians tions; the other demands an inward-
who have been proposing an ICC for looking devotion—among the political
Comments more than 20 years now—while the class and at the individual level—to cul-
It strikes me that nontraditional learning American position has been in firm op- tivating the better angels of our cyber
may be able to take up some of the position—beginning shortly after the natures. Both are worth pursuing.
slack. That won’t address the desire for first meeting between U.S. and Russian
conventional credentialing. I am not certain cyber teams. I co-chaired that meet- Mark Guzdial is a professor in the Computer Science &
Engineering Division of the University of Michigan. John
how that serves folks preparing themselves ing, and thought the Russians had Arquilla is Distinguished Professor of Defense Analysis at
for non-CS disciplines in which some proposed a reasonable idea: creating a the United States Naval Postgraduate School; the views
expressed are his alone.
computation grounding/experience is sought. voluntary arms control regime, like the
Just the same, I wonder if the current chemical and biological weapons con- © 2019 ACM 0001-0782/19/6 $15.00
W
HEN GEOFFREY HINTON the right answer was just asking too
started doing gradu- much. “People were very suspicious of
ate student work on the idea you could just learn from the
artificial intelligence data,” says Hinton, a professor emeri-
at the University of Ed- tus at the University of Toronto and
inburgh in 1972, the idea that it could now an engineering fellow at Google.
be achieved using neural networks that LeCun read Hinton’s work includ-
mimicked the human brain was in dis- ing, he says, a paper written in coded
repute. Computer scientists Marvin language to get around the taboo about
Minsky and Seymour Papert had pub- neural nets. “I learned about Geoff’s
lished a book in 1969 on Perceptrons, existence, and realized this was the
an early attempt at building a neural man I needed to meet,” he says. LeCun
net, and it left people in the field with did a postdoctoral fellowship in Hin-
the impression that such devices were ton’s lab, then moved to Bell Labs. He’s
nonsense. now a professor at New York University
“It didn’t actually say that, but that’s (NYU) and director of AI research at
how the community interpreted the Facebook.
book,” says Hinton who, along with Yo- Bengio also wound up at Bell Labs
shua Bengio and Yann LeCun, will re- in the early 1990s, where he and Le-
ceive the 2018 ACM A.M. Turing award cun worked together. “What really ap-
for their work that led deep neural net- pealed to me was the notion that by
works to become an important com- studying neural nets, I was studying
ponent of today’s computing. “People something that would be fairly general
thought I was just completely crazy to about intelligence, that would explain
be working on neural nets.” our intelligence and allow us to build
Even in the 1980s, when Bengio and intelligent machines,” Bengio recalls.
LeCun entered graduate school, neural Today, he is a professor at the Univer-
PHOTO BY A LEXA NDER BERG
nets were not seen as promising. Many sity of Montreal, scientific director of
people thought that building a net- Mila (the Montreal Institute for Learn-
work with random connections across ing Algorithms), and an advisor to Mi- From left, Yoshua Bengio,
Geoffrey Hinton, and Yann LeCun
multiple layers, giving it some data, crosoft. at the Vector Institute for Artificial
and letting it figure out how to reach Their work gained wide mainstream Intelligence in Toronto, Canada.
acceptance in 2012, after Hinton and examples. But to give machines a more
two students used deep neural nets to general intelligence that could solve
win the ImageNet challenge, identify- “Machines are still different types of problems or accom-
ing objects in a set of photos at a rate very, very stupid,” plish multiple tasks will require sci-
far better than that of any of their com- entists to come up with new concepts
petitors. Since then, the field has em- LeCun says. about how learning works, Bengio
braced the technology, which has also “The smartest AI says. “It might take a very long time be-
seen breakthroughs in speech recogni- fore we reach human-level AI,” he says.
tion and natural language processing, systems today have Meanwhile, society has to have
and could help make self-driving ve- less common sense more discussion about how to use ar-
hicles more reliable. tificial intelligence appropriately. Hin-
LeCun says theories about why than a house cat.” ton worries about how autonomous
neural nets would not work—that the intelligent weapons systems might be
training algorithms would get stuck misused, for instance. LeCun says that
in the extreme values of mathematical without adequate political and legal
functions known as local minima—fell protections, governments could use
to real-world experience. “In the end, the systems to track people and try to
what people were convinced by were Bengio came up with word embed- control their behavior, or corporations
not theorems; they were experimental dings, patterns of neuron activation might rely on AI to make decisions but
results,” he says. Even though there that represent word symbols, thereby ignore bias in their algorithms.
were local minima, those bad enough expanding exponentially the system’s To address some of these worries,
for an optimization algorithm to get ability to express meanings and mak- Bengio took part in a group that last De-
stuck were relatively rare. It turned ing it possible to process text and trans- cember issued the Montreal Declara-
out that if the neural nets were just big late it from one language to another. tion for a Responsible Development of
enough for the problem they were try- Hinton explains that the embeddings Artificial Intelligence, which outlines
ing to solve, they could get stuck, but make it easier for the system to reason principles that they say should be used
if they were larger, they became more by analogy, rather than by following a in pushing the technology forward.
efficient at optimization. “You make logical set of rules; he believes that is “We’re building stronger and stronger
those networks bigger and bigger and more like how the human brain works. technology based on the premises of
they work better and better,” LeCun The brain evolved to use patterns of science, but the organization of society
says. neural activity to perform perception and their collective wisdom isn’t keep-
Working both together and inde- and movement, and that makes it more ing up fast enough. The solution may
pendently, the three made important suited to reasoning by analogy rather not be in some new theorem or some
contributions to neural networks. than logic, he argues. new algorithm,” he says.
Among their several discoveries, Hin- In fact, artificial intelligence re- With such concerns in mind, Hin-
ton helped to develop backpropaga- mains limited compared to human in- ton says he will donate a portion of his
tion, an algorithm that calculates error telligence. “Machines are still very, very share of the $1-million Turing Award
at the output of the network and propa- stupid,” LeCun says. “The smartest AI prize money to the humanities at the
gates the results backward toward the systems today have less common sense University of Toronto. “If we have sci-
input, allowing the machine to improve than a house cat.” Though they excel ence without the humanities to help
its accuracy. LeCun developed convolu- at recognizing patterns, neural net- guide the political process, then we’re
tional neural networks, which replicate works have no knowledge of how the all in trouble,” he says. LeCun says he
feature detectors across space and are world works, and computer scientists will likely make a donation to NYU, and
more efficient for image and speech have not yet figured out how to give it Bengio says he’s considering some en-
recognition. to them. Humans learn to generalize vironmental causes.
Another development that helps the from a very small number of samples, Based on their experiences as aca-
system learn more effectively involves while neural networks require vast sets demic heretics who turned out to be
randomly turning off some of the neu- of training data. In fact, Hinton says, it right, they advise young computer sci-
rons about half of the time, introduc- was the growth in available datasets, entists to stick to their convictions. “If
ing some noise into the network. Ben- along with faster processors, that led someone tells you your intuitions are
gio says there is noise and randomness to the “phase shift” from neural net- wrong, there are two possibilities,”
in the way living neurons spike, and works being a curiosity to a practical Hinton says. “One is you have bad in-
something about that makes the sys- approach. tuitions, in which case it doesn’t mat-
tem better at dealing with variations There are hundreds of useful tasks ter what you do, and the other is you
in input patterns, which is key to mak- neural networks can accomplish just have good intuitions, in which case you
ing the system useful. “You want to be by using their current pattern recog- should follow them.”
good at doing the things you haven’t nition capabilities, Hinton says, from
yet seen, things that might be some- predicting earthquake aftershocks to Neil Savage is a science and technology writer based in
Lowell, MA, USA.
what different from the training data,” offering better medical diagnoses on
Hinton says. the basis of hundreds of thousands of © 2019 ACM 0001-0782/19/6 $15.00
Lifelong Learning
in Artificial Neural Networks
New methods enable systems to rapidly, continuously adapt.
O
V E R T H E PA S Tdecade, ar-
tificial intelligence (AI) Summary of General L2M Framework
based on machine learn-
ing has reached break-
through levels of per-
formance, often approaching and
sometimes exceeding the abilities of
human experts. Examples include im-
age recognition, language translation,
and performance in the game of Go.
These applications employ large
artificial neural networks, in which
nodes are linked by millions of weight-
ed interconnections. They mimic
the structure and workings of living
brains, except in one key respect—
they don’t learn over time, as animals
do. Once designed, programmed, and
trained by developers, they do not
adapt to new data or new tasks with-
out being retrained, often a very time-
consuming task.
Real-time adaptability by AI sys-
tems has become a hot topic in re-
search. For example, computer sci-
entists at Uber Technologies last year
The DARPA Lifelong Learning Machines (L2M) Program seeks to develop learning systems
published a paper that describes a that continuously improve with additional experience, and rapidly adapt to new conditions
method for introducing “plasticity” and dynamic environments.
in neural networks. In several test
applications, including image rec- For more than 60 years, neural with labeled examples. This training
ognition and maze exploration, the networks have been built from in- is most often done via a method called
researchers showed that previously terconnected nodes whose pair-wise backpropagation, in which the sys-
trained neural networks could adapt strength of connection is determined tem calculates an error at the synaptic
to new situations quickly and effi- by weights, generally fixed by training output and distributes it backward
INFOGRA PHIC COU RTESY OF DA RPA’ S ELEC TRONIC S RESURGENCE INIT IATIVE
JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 13
news
ACM News
highly accurate results when classify- of DARPA’s L2M program and a com- been a goal of AI researchers for many
ing different kinds of automobiles, puter science professor at the Univer- years, but major advancements have
but when a new kind of car (a Tesla, sity of Massachusetts, Amherst. “We only recently become feasible, en-
say) is seen, the system stumbles. will never be safe in a self-driving car abled by advancements in computer
“You want it to recognize this new without it,” she says. But it is just one power, new theoretical foundations
car very quickly, without retraining, of many necessary steps toward that and algorithms, and a better under-
which can take days or weeks. Also, goal. “It’s definitely not the end of the standing of biology. “In a few years,
how do you know that something new story,” she says. much of what we call AI today won’t be
has happened?” There are five “pillars” of lifelong considered AI without lifelong learn-
Artificial intelligence systems that learning as DARPA broadly defines ing,” she predicts.
learn on the fly are not new. In “neu- it, and synaptic plasticity falls into Miconi’s team is now working on
roevolution,” networks update them- the first of these. The pillars are: con- making learning more dynamic and
selves by algorithms that employ a tinuous updating of memory, without sophisticated than it is in his test sys-
trial-and-error method to achieve a catastrophic forgetting; recombinant tems so far. One way to do that is to
precisely defined objective, such as memory, rearranging and recom- make the plasticity coefficients, now
winning a game of chess. They require bining previously learned informa- fixed as a design choice, themselves
no labeled training examples, only tion toward future behavior; context variable over the life of a system. “The
definitions of success. “They go only awareness and context based modula- plasticity of each connection can be
by trial and error,” says Uber’s Miconi. tion of system behavior; adoption of determined at every point by the net-
“It’s a powerful, but a very slow, es- new behaviors through internal play, work itself,” he says. Such “neuro-
sentially random, process. It would self-awareness, and self-simulations; modulation” likely occurs in animal
be much better if, when you see a new and safety and security, recognizing brains, he says, and that may be a key
thing, you get an error signal that tells whether something is dangerous and step toward the most flexible decision-
you in which direction to alter your changing behavior accordingly, and making by AI systems.
weights. That’s what backpropagation ensuring security through a combina-
gets you.” tion of strong constraints.
Further Reading
Siegelmann cites smart prosthe-
Military Apps ses as an example of an application of Chang, O. and Lipson, H.
Miconi’s ideas represent just one of these techniques. She says the control Neural Network Quine,
Data Science Institute, Columbia
a number of new approaches to self- software in an artificial leg could be
University, New York, NY 10027, May 2018
learning in AI. The U.S. Department of trained via conventional backpropa- https://arxiv.org/abs/1803.05859v3
Defense is pursuing the idea of synap- gation by its maker, then trained to
Chen, Z. and Liu, B.
tic plasticity as part of a broad family the unique habits and characteristics Lifelong Machine Learning, Second Edition,
of experimental approaches aimed at of its user, and finally enabled to very Synthesis Lectures on Artificial Intelligence
making defense systems more accu- quickly adapt to a situation it has not and Machine Learning, August 2018
rate, responsive, and safe. The U.S. seen before, such as an icy sidewalk. https://www.morganclaypool.
Defense Advanced Research Projects A computational neuroscientist, com/doi/10.2200/
S00832ED1V01Y201802AIM037
Agency (DARPA) has established a Siegelmann says lifelong learning has
Lifelong Learning Machines (L2M) Hebb, D.
The Organization of Behavior: A
program with two major thrusts, one Neuropsychological Theory, New York:
focused on the development of com- DARPA’s Wiley & Sons, 1949
plete systems and their components, http://s-f-walker.org.uk/pubsebooks/pdfs/The_
and the second on exploring learning Lifelong Learning Organization_of_Behavior-Donald_O._Hebb.pdf
mechanisms in biological organisms Machines program Miconi, T., Clune, J., and Stanley, K.
Differentiable Plasticity: Training Plastic
and translating them into computa-
tional processes. The goals are to en- does not seek Neural Networks with Backpropagation,
able AI systems to “learn and improve incremental Proceedings of the 35th International
Conference on Machine Learning (ICML
during tasks, apply previous skills
and knowledge to new situations, in- improvements, 2018), Stockholm, Sweden, PMLR 80, 2018
https://arxiv.org/abs/1804.02464
corporate innate system limits, and “but rather Miconi, T.
enhance safety in automated assign-
ments,” DARPA says at its website. paradigm-changing Backpropagation of Hebbian Plasticity
for Continual Learning,
“We are not looking for incremental approaches to NIPS Workshop on Continual Learning, 2016
https://github.com/ThomasMiconi/
improvements, but rather paradigm-
changing approaches to machine machine learning.” LearningToLearnBOHP/blob/master/paper/
abstract.pdf
learning.”
Uber’s work with Hebbian plastic- Gary Anthes is a technology writer and editor based in
ity is a promising step toward lifelong Arlington, VA, USA
JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 15
news
And Then,
There Were Three
How long can the silicon foundry sector continue to adapt,
as physical limits make further shrinkage virtually impossible?
R
ELENTLESS YEAR-OVER-YEAR
IMPROVEMENTS in integrated
circuits don’t come cheap.
For years, these advances
have been boosted in part
by silicon foundries that invest in new
technology by aggregating demand from
design companies that don’t have facto-
ries of their own. As of last summer, how-
ever, only one such “pure-play” foundry
continues to pursue the latest silicon
generation, along with two companies
that also make their own chips. The
dwindling of suppliers revives the long-
standing question of how the industry
can adapt as physical limits eventually
make further shrinkage impossible (or
impossibly expensive).
Still, the story sounds familiar. “Ev-
ery time people say Moore’s Law has
finally hit the wall, people come up
with new, innovative approaches to get
around it,” said Willy Shih, Robert and
Jane Cizik Professor of Management
Practice at Harvard Business School.
The silicon industry has tracked
the 1965 observation by Gordon Unfortunately, exponentially in- announced in August 2018 that it was
Moore, co-founder and later head of creasing transistor counts were accom- halting development of its 7nm pro-
Intel, that transistor counts were dou- panied by corresponding increases in cess, “it was quite a shocker for a lot of
bling every year (later changed to every the costs to build fabrication plants people,” Shih said. The foundry compa-
two years). This exponential growth and develop more aggressive process- ny had originally projected risk produc-
became enshrined as a “law,” which es and novel device structures. These tion—early manufacture with relaxed
became a collective self-fulfilling costs, and the need to keep the expen- quality guarantees—of 7nm products in
prophesy as companies feared losing sive equipment in constant use, have spring 2018, and until recently seemed
business if they fell behind its aggres- long made it almost impossible for a committed. Now, the only remaining
sive schedule. Successive generations smaller company to manufacture a nov- pure-play foundry developing leading-
were labelled by an ever-shrinking dis- el chip design itself. “The capital invest- edge technology is Taiwan Semiconduc-
tance, currently 7nm, although this ment to supply a growing market and tor Manufacturing Company (TSMC),
designation long ago lost any clear to push leading-edge research can only whose 7nm process has been in produc-
relationship to the transistor’s gate be supported by a company that has a tion since early 2018. Besides TSMC,
length or other features. In the 1990s, large revenue,” probably $30 billion a Samsung, which has an important
Moore’s Law became formalized in year or more, said Paolo Gargini. “It’s foundry business in addition to manu-
the National (after 1998, Internation- just a game for the big boys,” said Gar- facturing its own chips, announced
al) Technology Roadmap for Semicon- gini, formerly at Intel, who has headed in fall 2018 that it was ready for risk
IMAGE BY MACRO P HOTO
ductors, which spelled out what man- the formal roadmap through its recent production of 7nm. Intel, whose cur-
ufacturers, equipment suppliers, and rebirth as the International Roadmap rent 10nm process is often regarded as
academic researchers would need to for Devices and Systems (IRDS). similar to TSMC’s 7nm process, devotes
do to keep the industry on track. Nonetheless, when GlobalFoundries most of its attention to its own chips.
JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 17
news
To be sure, GlobalFoundries and the leading edge, but they’re catching GPUs and other high-performance prod-
others (including TSMC) can still build up on some of the trailing-edge tech- ucts. “We still can squeeze another two
very powerful products using older nologies,” Shih said. “The thing that is or three generations out of 2D,” Gargini
technologies. Moreover, Shih notes, driving TSMC is less competition from said, but he sees full 3D as inevitable and
“Some people say that, once we went GlobalFoundries; it’s competition from adding another 15 years of performance
below 14nm, or perhaps even higher Made in China 2025 [a Chinese pro- growth. “3D is not really as much of a
like 22nm, the unit cost per transis- gram to improve domestic manufactur- revolution” or as risky as the process in-
tor stopped decreasing and started in- ing competitiveness].” novations the industry has already im-
creasing again.” As a result, “more and plemented, he said. “The big guys can
more users say ‘That [leading-edge] A Bright Future? do it anytime they decide to do it.”
process is so expensive, I actually don’t In the end, though, no amount of in- The semiconductor industry faces
need it,’” he said, unless they are “mak- novation can extend exponential scal- challenges that we may look back on
ing things for cellphones or FPGAs or ing forever. Logic designers “are wait- as the end of Moore’s Law. Nonethe-
the bleeding-edge stuff like Intel micro- ing for EUV to save the game,” Gargini less, there are continued opportunities
processors, where you really need the said, but even if advanced lithography for better products, and so far there
ultimate in performance and power.” buys a few years, “that solution comes are still foundry companies ready and
Indeed, a manufacturer that spe- to an end.” In perhaps 2020 or 2021, able to enable new designs. “There is a
cializes in digital logic may not need a he conjectured, “Samsung, TSMC, or bright future,” Gargini insists. “I think
broad range of processes. In contrast, Intel, one of them will make a big an- it’s a very good balance.”
foundries support a whole range of nouncement that their next product is
devices, such as image sensors, and 3D [three-dimensional],” which would
Further Reading
devices for analog, radio-frequency, offer more transistors through verti-
and ultra-low-power circuits. Reliably cal stacking. Memory manufacturers International Roadmap for Devices and
Systems 2017 Edition, IEEE,
implementing such mix-and-match (including Samsung) have already be-
https://irds.ieee.org/roadmap-2017
processes in a design environment gun to introduce 3D structures, both by
that lets multiple customers use them stacking processed layers and growing Shih, W. C., Chien, C.F., Shih, C., and Chang, J.
The TSMC Way: Meeting Customer Needs at
is often more important to designers multiple layers of devices (see “Elec- Taiwan Semiconductor Manufacturing Co.,
than having the latest-generation tech- tronics are Leaving the Plane,” Com- Harvard Business School Case Collection
nology. For example, although TSMC munications, August 2018). Memory has 610-003, August 2009, https://www.hbs.
boasts dozens of high-end customers special advantages for 3D structures, edu/faculty/Pages/item.aspx?num=37868
for its 7nm process, for example, it con- such as uniform and redundant layouts, Monroe, D.
tinues to support older-generation pro- and low power (because most transis- Electronics are Leaving the Plane,
cesses, even the 180nm technology it tors are idle). Communications, August 2018,
https://cacm.acm.org/
introduced 20 years ago, which is good In contrast, in logic applications, magazines/2018/8/229776-electronics-are-
enough for many customers. many more transistors are active, and leaving-the-plane/fulltext
If leading-edge development slows removing the heat they produce is enor-
down, though, it might give other mously challenging even in the easier- Don Monroe is a science and technology writer based in
companies, including those in main- to-cool planar layout. So far, logic com- Boston, MA, USA.
land China, more chance to compete. panies are testing the 3D waters with
“The Chinese are having trouble at advanced packaging techniques for © 2019 ACM 0001-0782/19/6 $15.00
Milestones
Ethics in
Technology Jobs
Employees are increasingly challenging
technology companies on their ethical choices.
O
RGANIZED PROTESTS AGAINST noting that the company had taken
companies are hardly a “an increasingly hard line” on inap-
new phenomenon, as peo- “Silicon Valley propriate conduct at work and had
ple have boycotted or pro- companies lead the fired 48 people, including 13 senior
tested both corporate poli- managers, in the previous two years,
cies and actions for years. For example, way in ... science and without giving any of them exit pack-
a global protest of international agro- technology, but when ages. Just prior to a November 1 pro-
chemical and agricultural biotechnol- test by employees known as “The
ogy corporation Monsanto in 2013 saw it comes to issues Walkout for Real Change,” Pichai
coordinated marches across 52 coun- of privacy, creating sent out a follow-up note apologiz-
tries and 436 cities. In 2010, thousands ing “for the past actions and the pain
of people in the U.S. protested against inclusive workplaces, they have caused employees” and in-
oil giant BP for its role in the Deepwater and ethics, they seem dicating that employees would be sup-
Horizon oil spill. And in the late 1990s, ported if they protested.
U.S. gun owners protested against gun to be devolving.” Despite the apology, thousands of
manufacturers Colt Manufacturing Google employees around the world
Company and Smith & Wesson for their walked out on November 1, and orga-
perceived cooperation with then-Presi- nizers issued a statement demand-
dent Bill Clinton’s gun control efforts. ing more transparency from Google
Yet many of the corporate protests that detailed a culture of sexual harass- around its handling of sexual harass-
that have occurred against technol- ment at the ride-sharing giant, which ment, an end to pay and opportunity
ogy companies over the past year were ultimately led to changes at the company inequality, and more employee em-
marked by a distinct difference: they and the dismissal of its former CEO, Tra- powerment overall. In addition, the
were often organized by, led, or coordi- vis Kalanick. “Fowler’s actions showed group requested that an employee
nated with workers at the very compa- that even individual tech workers, by representative be appointed to the
nies being protested. The impetus for speaking up, can actually have a large ef- company’s board and that Google end
these walkouts appears to be largely fect on the organization that they’re in or “forced arbitration” in cases of harass-
two issues: the presence of a culture of were formerly in,” Sahami says. ment and discrimination, a practice
inequality at technology companies, It is not just a culture of misogyny that prevents employees from taking
and the use of technology for what that is irritating workers and spurring cases to court.
workers consider to be unethical or them into action; a lack of transparency “Silicon Valley companies lead the
harmful activities. is also a key catalyst for workers to band way in the fields of science of and tech-
Although there is precedent for tech together to make their feelings known. nology, but when it comes to issues of
workers protesting against their em- One example was Google’s handling privacy, creating inclusive workplaces,
ployers, such as when defense workers of a $90-million exit payment to Andy and ethics, they seem to be devolving,”
in the 1980s pushed back against their Rubin, a key executive of the company says Congresswoman Jackie Speier,
employers’ participation in the develop- and the creator of the Android mobile who represents San Francisco and
ment of the Strategic Defense Initiative, operating system. Upon Rubin’s depar- parts of Silicon Valley, and publicly
colloquially known as Star Wars, the dif- ture from the company in 2014, Google supported the walkouts.
ference is that tech workers feel more failed to disclose it had received a com- Lack of diversity is a problem in the
empowered to speak out today. plaint that Rubin had committed an act tech industry. For example, nearly 70%
“[Workers] actually see that their of sexual misconduct against another of Google employees are men and 53%
words and action can have a real impact employee, and that an investigation are non-Hispanic whites, according
on a broader scale,” says Mehran Sa- had confirmed its veracity. In October to the Google Diversity Annual Report
hami, a professor of computer science 2018, a report in The New York Times 2018. Among leadership roles, the
at Stanford University. Sahami points to made these details public. numbers within Google are even less
the success former Uber employee Su- Upon that disclosure, Google CEO diverse, as 67% are white non-Hispanic
san Fowler had with blog posts she wrote Sundar Pichai sent a memo to staff and 75% are men.
JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 19
news
“On the issue of diversity, I continue While CEO Marc Benioff condemned the company builds, the customers to
to hear from women and other workers the agency’s separation of families at which the companies sell, or how the
in the tech industry who are harassed, the border, he refused to cancel the companies treat their own employees.
bullied, assaulted, and ignored be- contract, and the company still sup- “Questions have always been raised
cause they weren’t frat buddies with the plies software to the agency, despite about what companies do and why they
CEO or turned down sexual overtures,” continuing pressure from workers. do it,” Hafrey says. “We’re just seeing
Speier says. “It’s a cultural crisis, and Ultimately, workers may be able to it in a way that I think maybe we were
as I’ve made clear to the tech compa- make their voices heard, but manage- not previously considering because
nies in and around my district, the ment at many large companies are we were enamored of the bright future
industry will never reach its full po- likely to be more focused on how their that our recent technologies promised
tential until this crisis is addressed.” decisions impact the company’s bot- us, and we are now realizing the down-
Google is hardly the only company tom line, and so may not always bow to side or potential downsides of some of
being subjected to protests from its the wishes of employees. those technologies.”
own employees; others also have pro- Ceren Cubukcu, an employment Sahami adds that there may be a
tested how technology being devel- consultant and author of Make Your generational reason for the increasing
oped by the companies they work for American Dream A Reality: How to Find level of activism in the technology field.
is being used by government entities. a Job as an International Student in the “There’s lots of data that shows, for ex-
Representatives from Amazon, Sales- U.S., says employees may simply decide ample, that many in the younger gen-
force, and Microsoft signed petitions to work for another company if they eration look for work that they believe
and held demonstrations objecting to have a problem with a technology com- that has value and that’s more impor-
how their work is being used for sur- pany’s actions, rather than protesting tant to them than just the paycheck;
veillance, or to separate families at the to get their employer to change course. it’s believing that they’re having some
U.S. border. According to Leigh Hafrey, “In some projects, especially for sort of social impact,” Sahami says.
a Senior Lecturer at the Massachusetts IT/high tech projects, you don’t even “There’s been a lot of bad behavior,
Institute of Technology Sloan School of know what the whole project will be and not just in the tech industry, but
Management and author of the book at the end because you work in teams, more broadly around issues of sexual
The Story of Success: Five Steps to Mas- and only the top management knows harassment that has been in some
tering Ethics in Business, these protest about the whole project,” Cubukcu sense tolerated for a long time. And it
actions are occurring because workers says. “If you don’t feel comfortable in shouldn’t have been tolerated, but over
are more aware of questions of social your job or don’t like your work, you time, culture changes and people are
justice and what constitutes appropri- can always try to switch to another job, willing to speak up more about that be-
ate and inappropriate behavior. and the company can always replace ing unacceptable and so, generational-
“We’ve had a lot of social move- you with some other employee.” ly, we begin to call out more and more
ment over the past several decades that That said, the bargaining posi- of these bad behaviors that’s been hap-
raised awareness and made people tion for many tech workers is perhaps pening and try to rectify it.”
conscious of what can potentially hap- stronger than it ever has been in his-
pen within organizations,” Hafrey says. tory, given that programmers, software
Further Reading
Indeed, thousands of workers at engineers, and data scientists that are
Amazon, Google, Microsoft, and Sales- talented, hardworking, and reliable are Fowler, S.
Reflecting on one very, very strange
force have signed petitions asking relatively hard to find and keep.
year at Uber, Feb. 19, 2017, https://
their respective management teams “Finding good technical people is www.susanjfowler.com/blog/2017/2/19/
to cancel or withdraw from contracts difficult,” Sahami says, “so companies reflecting-on-one-very-strange-year-at-uber
with U.S. government agencies, in- pay more attention to their workers be- Keller, M., and Larsen, K.
cluding Immigration and Customs cause they realize that these are highly ‘Enough is enough’: Google workers
Enforcement, Customs and Border skilled people who are difficult to find. in San Francisco, Mountain View,
Protection, and the Department of De- If those tech workers leave, it’s going to Sunnyvale walk out in protest of treatment
of women, November 1, 2018, ABC 7
fense. The public nature of these pro- have a serious impact on the productiv-
News San Francisco,
tests and petitions may be having an ity of the company.” https://abc7news.com/business/enough-is-
effect; in June 2019, Google employees Even young people who have yet to enough-bay-area-google-workers-walk-out-
succeeded in getting the company to establish themselves in their careers in-protest/4596806/
agree not to renew its deal to help the are trying to flex their muscles, shun- Brown D.
Pentagon build artificial intelligence ning companies they don’t agree with “Google Diversity Annual Report
tools for drone warfare. during the interview and hiring pro- 2018.” Diversity.Google. https://static.
googleusercontent.com/media/diversity.
Other protests have been less than cess. A Buzzfeed article published in Au-
google/en//static/pdf/Google_Diversity_
successful. Salesforce.com employees gust 2018 included several accounts of annual_report_2018.pdf
gathered twice in 2018 in front of the tech workers that declined lucrative po-
company’s headquarters in San Fran- sitions at major technology companies Keith Kirkpatrick is principal of 4K Research &
cisco to protest the firm’s multimil- because they disagreed with the com- Consulting, LLC, based in Lynbrook, NY, USA.
lion-dollar contract with the U.S. Cus- pany’s practices or ethical positions, re-
toms and Border Protection agency. lating to either the products or services © 2019 ACM 0001-0782/19/6 $15.00
q Join ACM-W: ACM-W supports, celebrates, and advocates internationally for the full engagement of women
in computing. Membership in ACM-W is open to all ACM members and is free of charge.
PAYMENT INFORMATION
Name
Purposes of ACM
ACM is dedicated to:
Mailing Address 1) Advancing the art, science, engineering, and application
of information technology
2) Fostering the open interchange of information to serve
both professionals and the public
City/State/Province
3) Promoting the highest professional and ethics standards
ZIP/Postal Code/Country
By joining ACM, I agree to abide by ACM’s Code of Ethics
q Please do not release my postal address to third parties (www.acm.org/code-of-ethics) and ACM’s Policy Against
Harassment (www.acm.org/about-acm/policy-against-
harassment).
Email Address
I acknowledge ACM’s Policy Against Harassment and agree
q Yes, please send me ACM Announcements via email that behavior such as the following will constitute
q No, please do not send me ACM Announcements via email grounds for actions against me:
Global Computing
Global Data Justice
A new research challenge for computer science.
W
HEN THE WORLD’S larg- databases and analytics that allow tion, and can help evaluate progress
est biometric popula- previously invisible populations to toward achieving the Sustainable De-
tion database—India’s be seen and represented by authori- velopment Goals. If data technologies
Aadhaar system—was ties, and which make poverty and are used in a good cause, they confer
challenged by activ- disadvantage harder to ignore, are a unprecedented power to make the
ists the country’s supreme court powerful tool for the marginalized world a fairer place.
issued a historic judgment. It is not and vulnerable to claim their rights That ‘if’, though, deserves some at-
acceptable, the court said, to allow and entitlements, and to demand fair tention. The new data sources’ value to
commercial firms to request details representation.2 This is the claim the the United Nations, to humanitarian
from population records gathered United Nations is making5 in rela- actors, and to development and rights
by government from citizens for pur- tion to new sources of data such as organizations are only matched by
poses of providing representation and cellphone location records and social their market value. If it is possible to
care. The court’s logic was important media content: if the right authorities monitor who is poor and vulnerable, it
because this database had, for a long can use them in the right way, they is also possible to manipulate and sur-
time, been becoming a point of con- can shine a light on need and depriva- veil. Surveillance scholar David Lyon3
tact between firms that wanted to con- has said that all surveillance operates
duct ID and credit checks, and govern- along a spectrum between care and
ment records of who was poor, who How to set control: a database like Aadhaar can be
was vulnerable, and who was on which used to channel welfare to the needy,
type of welfare program. The court boundaries but it could also be used to target con-
also, however, said that this problem for powerful sumers for marketing, voters for politi-
of public-private function creep was cal campaigns, transgender people or
not sufficiently bad to outweigh the international HIV sufferers for exclusion—the list is
potential good a national population actors is a question endless. The possibilities for monetiz-
database could do for the poor. Many ing the data of millions of poor and
people, they said, were being cheated yet to be solved vulnerable people are endless, and
out of welfare entitlements because in any field. may be irresistible if hard boundaries
they had no official registration, and are not set. But how to set boundaries
this was more unfair than the moneti- for powerful international actors is a
zation of their official records. question yet to be solved in any field.
This judgment epitomizes the Data technologies have very dif-
problem of global data justice. The ferent effects in different social, eco-
A woman has her eyes scanned while others wait during the Aadhaar registration process in India circa October 2018. Aadhaar produces
identification numbers to individuals issued by the Unique Identification Authority of India on behalf of the Government of India for the
purpose of establishing the identity of every single person.
nomic, and political environments. derstood differently in different plac- ones, and even if they work from simi-
WhatsApp, for example, allows par- es. Nigeria, the U.S., and India, for lar templates, will apply them differ-
ents’ groups to message each other example, will each have a different ently. Democracies will set boundar-
about carpooling. It also facilitates idea of what is ‘good’ or ‘necessary’ to ies for data collection and use that
ethnic violence in India and Myan- do with data technologies, and how are different from those of authoritar-
mara and facilitates extremist poli- to regulate their development and ian states—yet we all have to work to-
ticsb in Brazil. Technology almost al- use. Our research asks how to recon- gether on this problem. Like climate
ways has unintended consequences, cile those different viewpoints, given change, any unregulated data market
and given the global reach of apps that each of those international ac- affects us all.
and services, the consequences of our tors—plus myriad others—will have So neither harmonized data pro-
global data economy are becoming the power to develop and sell data tection nor ethical principles are the
less and less predictable.1 technologies that will affect people answer—or at least not on their own.
Global data justice researchers are all around the world. Ethics, at the moment at least, is too
aiming to frame new governance so- Currently much of the internation- frequently just a cover for self-regu-
PHOTO BY DAVID TA LUK DA R/NURP HOTO VIA GET T Y IMAGES
lutions that can help with this glob- al discussion revolves around har- lation.6 We need to ask global ques-
al level of unpredictability. In this monizing data protection amongst tions about global problems, but we
emerging research field, we are ex- countries, and getting technology de- are often stuck looking at our own
ploring how the tools we have are glo- velopers to agree on ethical principles environment and our own set of tools,
balizing: regulation, research ethics, and guidelines. Neither of these are without understanding what kind of
professional standards and guide- bad ideas, but each can go in a radi- toolkit can address the international-
lines are all having to be translated cally different direction depending level consequences of our growing
into new environments, and get un- on local views on what is good and data economy.
desirable. Strongly neoliberal, pro- If we ask this global question, in-
a See https://bit.ly/2zWDIKO market countries will develop differ- stead: How to draw on approaches
b See https://nyti.ms/2EzEP5h ent principles from more socialist that are working in different places,
JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 23
viewpoints
DOI:10.1145/3325284 A.T. Markettos, R.N.M. Watson, S.W. Moore, P. Sewell, and P.G. Neumann
Inside Risks
Through Computer
Architecture, Darkly
Total-system hardware and microarchitectural
issues are becoming increasingly critical.
S
PECTRE, 11 MELTDOWN, 13 FORE-
SHADOW, 18,20 Rowhammer,9
Spoiler, —suddenly it seems
9
JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 25
viewpoints
processor hardware (typically subject there by the designer but were created
to extensive verification) has long been by the physical implementation, often
assumed to provide a solid foundation Designers need unhelpfully sucking away signals or
for software, but increasingly suffers to understand more power. Today we have parasitic com-
from its own vulnerabilities. Second, puters. Many components have unin-
increasing complexity and the way sys- of what takes place tended computational power, which
tems are composed of many hardware/ in layers above can be perverted—from the x86 page-
software pieces, from many vendors, fault handler2 to DMA controllers.16
means one cannot think just in terms or below their field This presents a challenge to under-
of a single-processor architecture. We of expertise. standing where all the computation is
need to take a holistic view that ac- happening, such as what is software
knowledges the complexities of this rather than hardware.
landscape. Third, and most seriously,
these new attacks involved phenomena Toward Robustly Engineered
that cut across the traditional architec- Trustworthy Systems
tural abstractions, which have inten- exploitable malfunction. Unlike the bi- Total-system approaches to security
tionally only described the envelopes nary code of malware, there is no way to defenses are important (see, for ex-
of allowed functional behavior of hard- observe many of these physical proper- ample, Bellovin3). A further lesson
ware implementations, to allow imple- ties. As a result, systems are more vul- from physical-layer attacks is why
mentation variation in performance. nerable to both design mistakes and such attacks are not more of a threat
That flexibility has been essential to supply-chain attacks. today—due to further layers of pro-
hardware performance increases—but As the recent attacks demonstrate, tection. It is not enough to extract
the attacks involve subtle information side-channels are becoming more the cryptographic key from a banking
flows via performance properties. They powerful than expected. Traditional card using laser fault injection; the at-
expose the hidden consequences of physical-layer side-channels are a sig- tacker must also use it to steal money.
some of the microarchitectural inno- nals-from-noise problem. If you record At this point the bank’s system-level
vations that have given us ever-faster enough traces of the power usage, with defenses apply, such as transaction
sequential computation in the last de- powerful enough signal processing, limits and fraud detection. If the key
cades, as caching and prediction leads you can extract secrets. Architectural relates only to one account, the payoff
to side-channels. side-channels have more bandwidth involves only money held by that cus-
and better signal-to-noise ratios, leak- tomer, not all other customers. Ap-
Hardware Vulnerabilities ing much more data more reliably. plication-level compartmentalization
Ideally, security must be built from the If we take a systems-oriented view, limits the reward, and thus makes the
ground up. How can we solve the prob- what can we say about the problem? attack economically nonviable.
lem by building the foundations of se- First of all, the whole is often worse Another approach is to ensure that
cure hardware? than the sum of its parts. Systems are richer contextual information is avail-
For years, hardware security to many composed of disparate components, able that allows the hardware to under-
people has meant focusing on the often sourced from different vendors, stand and enforce security properties.
physical layers. Power/electromagnetic and often granting much greater access The authors are on a team designing,
side-channels and fault injection are to resources than needed to fulfill their developing, and formally analyzing
common techniques for extracting purpose; this can be a boon for attack- the CHERI hardware instruction-set
cryptographic secrets by manipulating ers. For example, in Google Project Ze- architecture,20 as well as CHERI oper-
the physical implementation of a chip. ro’s attack on the Broadcom Wi-Fi chip ating system and application security.
These are not without effectiveness, inside iPhones,4 the attackers jumped The CHERI ISA can enable hardware to
but it is notable that the new spate of from bad Wi-Fi packets to installing enforce pointer provenance, arbitrarily
attacks represents entirely different, malicious code on the Wi-Fi chip, and fine-grained access controls to virtual
and more potent, attack vectors. then to compromising iOS on the ap- memory and to abstract system ob-
One lesson from the physical-layer plication processor. Their ability to use jects, as well as both coarse- and fine-
security community is that implemen- the Wi-Fi chip as a springboard mul- grained compartmentalization. To-
tation is critical. Hardware definition tiplied their efficacy. It is surprisingly gether, these can provide enforceable
languages (HDLs) are compiled down difficult to reason about the behavior of separation and controlled sharing, al-
to connections between library logic such compositions of components.5 At- lowing trustworthy and untrustworthy
cells. The logic cells are then placed tackers may create new side-channels software (including unmodified legacy
and routed and the chip layer designs through unexpected connections—for code) to coexist securely. Since the
produced. One tiny slip—at any level example, a memory DIMM that can hardware has awareness of software
from architecture to HDL source and send network packets via a shared I2C constructs such as pointers and com-
compiler, to cell transistor definitions, bus with an Ethernet controller.17 partments, it can protect them, and we
routing, power, thermals, electromag- Hardware engineers often talk can reason about the protection guar-
netics, dopant concentrations and about ‘parasitic’ resistance or capaci- antees—for example, formally proving
crystal lattices—can cause a potentially tance—components that were not put the architectural abstraction enforces
JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 27
V
viewpoints
The Profession of IT
An Interview with
David Brin on Resiliency
Many risks of catastrophic failures of critical infrastructures can be
significantly reduced by relatively simple measures to increase resiliency.
M
ANY PEOPLE TODAY are
concerned about critical
infrastructures such as
the electrical network, wa-
ter supplies, telephones,
transportation, and the Internet. These
nerve and bloodlines for society depend
on reliable computing, communications,
and electrical supply. What would happen
if a massive cyber attack or an electromag-
netic pulse, or other failure mode took
down the electric grid in a way that re-
quires many months or even years for re-
pair? What about a natural disaster such
as hurricane, wildfire, or earthquake that
disabled all cellphone communications
for an extended period?
David Brin, physicist and author,
has been worrying about these issues
for a long time and consults regularly
with companies and federal agen-
cies. He says there are many relatively
straightforward measures that might
greatly increase our resiliency—our
ability to bounce back from disaster. I
spoke with him about this.
gathering, and big data, all in search us to roll with any blow and come up er, we cannot use our computers or ac-
of dangers to anticipate and counter fighting, keeping a surprise from be- cess our files stored in the Internet. Even
in advance. Citizens know little about ing lethal. It’s what worked on 9/11, our best disaster planning cannot fix the
how many bad things these protectors when all anticipatory protective mea- disruption if infrastructure damage is
have averted. But this specialization in sures failed. severe. Yet, communication is essential
for recovery. What can we do to preserve Q: What about solar on the southward
our ability to communicate? walls of buildings to power the build-
On 9/11, passengers aboard flight Alas, our comm ings? Some cities are already doing this.
UA93 demonstrated remarkable resil- systems are fragile Sure, south-facing walls are anoth-
ience when they self-organized to stop er place for photovoltaics. But there’s
the terrorist plot to use that plane as to failure in competition for that valuable real
a weapon against their country. If we any natural or estate—urban agriculture. Technolo-
want that kind of resilience to work on gies are cresting toward where future
a large scale, we need resilient commu- unnatural calamity. cities may require new buildings to
nications. Alas, our comm systems are recycle their organic waste through
fragile to failure in any natural or un- vertical farms that purify water while
natural calamity. One step toward resil- generating either industrial algae or
ience would be a backup peer-to-peer else much of the food needed by a me-
(P2P) text-passing capability for when business solar system when the electri- tropolis. With so much of the world’s
phones can’t link to a cellular tower. cal utility has blacked out. The purpose population going urban, no technol-
Texts would get passed from phone to is to prevent spurious home-generat- ogy could make a bigger difference.
phone via well-understood methods of ed voltages from endangering repair The pieces are coming together.
packet switching until they encounter linemen. This is a lame excuse for an What’s lacking is a sense of urgency.
a working node and get dropped into insane situation. Simply replace that Pilot programs and tax incentives
the network. Qualcomm already has cutoff switch with one that would still should encourage new tall buildings
this capability built into their chips! block backflow into the grid, but that to utilize their southward faces, nur-
But cellular providers refuse to turn it feeds from the solar inverter to just two turing this stabilizing trend during
on. That’s shortsighted, since it would or three outlets inside the home, run- the coming decade.
be good business too, expanding text ning the fridge, some rechargers, and
coverage zones and opening new rev- possibly satellite coms. Just changing Q: You’ve also spoken about apps sys-
enue streams. Even in the worst na- over to that switch would generate ar- tems that turn your smartphone into
tional disaster, we’d have a 1940s-level chipelagos of autonomous, resilient an intelligent sensor. Can you say how
telegraphy system all across the nation, civilization spread across every neigh- this supports resiliency?
and pretty much around the world. borhood in America, even in the very Cellphones already have powerful
All it would take to fix this is a small worst case. A new rule requiring such cameras, many with infrared capabil-
change of regulation. Five sentences switches, and fostering retrofitting, ity. Soon will come spectrum-analysis
requiring the cell-cos to turn this on would fit on less than a page. apps, letting citizens do local spot
whenever a phone doesn’t sense a Across the next decade, more solar checks on chemical spills or environ-
tower. (And charge a small fee for P2P systems will come with battery storage. mental problems, and feeding the
texts.) Doing so might let us restore But this reform would help us bridge results to governments or NGOs for
communications within an hour rather the next 10 years. modeling in real time. The Tricorder X
than months. Prize showed how just a few add-on de-
Many efforts have been made to Q: What about protection against elec- vices can turn a phone into a medical
empower folks with ad hoc mesh net- tromagnetic pulse disruption? appraisal device, like Dr. McCoy had in
works, via Bluetooth, Wi-Fi webs, and Much has been written about danger “Star Trek.” Almost anyone could use
so on. None of these enticed more than from EMP—either attacks by hostile such apparatus in the field with little
a tiny user base—nothing like what’s powers or else the sort of natural disas- training. Take a few measurements,
needed for national resilience. ter we might experience if the Sun ever and a distant system advises you on
struck us head-on with a coronal mass corrective actions.
Q: It appears that solar power for ejection, commonly called a solar flare. Infrared sensors, accelerometers, and
homes and offices is at a tipping point These CMEs happen often, peaking chemical sensors could provide a full
as more people find it cheaper than every 11 years. We’ve been lucky as the array of environmental awareness sys-
the power grid. Localized solar power worst ones have missed Earth. But some tems by turning citizen cellphones into
should also bring new benefits such as space probes have been taken out by di- nodes of an instant awareness network.
ability to maintain minimum electrical rect hits and a bulls-eye is inevitable. (I describe this in my novel Existence.)
function at home during a blackout. Is The EMP threat was recognized over Such a mesh is already of interest to
independence from the electrical grid 30 years ago. We could have incentiv- national authorities. But the empha-
good for resilience? ized gradual development of shielded sis has been hierarchical—authori-
It would be. One can envision a mil- and breakered chipsets, including ties send public reports down to citi-
lion solar-roofed homes and business- those in civilian electronics. Adoption zens after gathering and interpreting
es serving as islands of light for their could have been stimulated with a tax data flowing upward. The hierarchical
neighborhoods, in any emergency. But of a penny per non-compliant device, mind-set comes naturally when you
there’s a catch. Under current regula- with foreseen ramp-up. By now we’d are an authority with protective duties.
tions, almost all U.S. solar roofs have be EMP resilient, instead of fragile hos- But this can blind even sincere public
a switch that shuts down the home or tages either to enemies or to fate. servants to one of our great strengths—
JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 29
viewpoints
the ability of average citizens to self-or- 5,000 amateurs’ backyards, spread hoods in SUVs stealing stuff and espe-
ganize laterally. around the world. As Earth rotates, cially food, with no police to stop them.
Use your imagination. The great- these backyard stations would sweep I well-understand this worry! I’ve
est long-term advantage of our kind of the sky in overlapping swathes, sift- written collapse-of-civilization tales.
society is that lateral citizen networks, ing for anomalous signals, but also (One of them, The Postman, was filmed
while occasionally inconvenient to detecting almost anything interesting by Kevin Costner.) Hollywood pres-
public servants, aren’t any kind of mac- that happens up there. Argus failed ents so many apocalyptic scenarios,
ro-threat, but will make civilization earlier because of the complexity we tend to assume we live on a fragile
perform better. This is in contrast to and expense of racks of equipment. edge of collapse. But Rebecca Solnit’s
despotic regimes, for whom such citi- Today—with a small up-front invest- book, A Paradise Built In Hell, shows
zen empowerment would be lethal. ment by some mere-millionaire—we decisively that average citizens—
could offer a small box for a couple of whether liberal or conservative—are
Q: Some of your proposals are less fa- hundred bucks that could be latched actually pretty tough and dynamic.
miliar. You have spoken of “all sky to an old TV dish-antenna, then Wi- They quickly self-organize to help
awareness.” What is that and how does Fi linked via the owner’s home. The their neighbors. A quarter or more of
it improve resiliency? dish—plus a small optical detector— citizens will almost always run toward
Defense and intelligence folks could report detections in real time whatever the problem is. Take citizen
know we need better 24/7 omni-aware- and any pair or trio that correlate response on 9/11, or when disasters
ness of land, sea, and air. Major efforts would then trigger a look by higher- hit their neighborhoods.
involve protective services and space level, aimable devices. If “affluent neighborhoods” want to
assets. When the Large Synoptic Tele- Sure, most of the participants be safe, there’s one method that works
scope comes online in Chile, we’ll would think of their backyard SETI over the long run … don’t alienate the
find 100 times as many asteroids that stations as helping sift the sky for poor and middle class and ensure that
could threaten our planet, or like the aliens. So? As a side benefit, we’d the vast majority identify as members
one that broke 10,000 windows in Che- become hundreds of times better at of the same overall tribe. As neighbors,
lyabinsk. Closer to home, dangerous detecting almost any transient phe- we’ll come to your defense.
space debris should be tracked round nomenon overhead, improving both
the globe. anticipation and resilience. Q: Anything to mitigate cyber attacks,
Similar technology could improve I can go on with a much longer list including phishing and massive iden-
air safety and impede smugglers by of unconventional and generally very tity theft?
tracking both legal and illicit air traf- inexpensive ways that very simple regu- Sincere people across the spectrum
fic. For example, the cell networks I latory or incentive actions might trans- are right to worry about companies
mentioned earlier could detect and form national resilience, making soci- and governments collecting massive
triangulate aircraft engine sounds ety more robust to withstand shocks amounts of personal data on citizens:
for comparison to an ongoing data- across the decades ahead. from the ways they use their smart-
base, especially at low altitudes where phones, to always-on mics at home and
drug smugglers and human traffick- Q: What about civil unrest or lawless- office (for example, Alexa). Phishing is
ers operate, or where terrorists might ness if the disaster takes out or over- another example where crooks use al-
attempt an attack, or detecting the whelms local law enforcement? Easy to ready open knowledge about you to lure
path of airliners that stray, like Ma- see gangs roaming affluent neighbor- you into fatal online mistakes. We all
laysian Air flight 370. Imagine those fret about disparities of power that may
in peripheries like Canada, Alaska, or lead to the “telescreen” in George Or-
nearby waters automatically report- Sincere people well’s Nineteen Eighty-Four. From facial
ing sonic booms. Among myriad more recognition to video fakery to brainwave
mundane uses, these might perhaps across the spectrum interpretation and lie detectors, if these
localize incoming hypersonic weap- are right to worry techs are monopolized by one elite or
ons, of the kind announced recently another, we may get Big Brother forever.
by Russian President Vladimir Putin. about companies There are forces in the world who are
Sound implausible? In Decem- and governments eager for this. China’s “social credit”
ber 2018, a loose network of amateur system aims to the masses to enforce
‘plane-spotters’ managed to track Air collecting massive conformity on one another.
Force One visually, during President amounts of personal In the West, most people are right to
Trump’s top-secret Christmas dash to find this prospect terrifying. The reflex
a U.S. air base in Iraq. A U.K. photogra- data on citizens. in response is to say: “let’s ban or re-
pher used these clues to snap the un- strict this new kind of light.” And that
mistakable, blue-and-white 747 jetting is the worst possible prescription. The
far overhead. elites we fear will only gain great power
Another method: revive the SETI if they can operate in secret, enhanc-
League’s Project Argus, aiming to es- ing that disparity, because we won’t be
tablish radio and optical detectors in able to look back.
JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 31
V
viewpoints
Viewpoint
Personal Data and
the Internet of Things
It is time to care about digital provenance.
W
E HAVE ALL read market
predictions describing
billions of devices and
the hundreds of billions
dollars in profit that the
Internet of Things (IoT) promises.a Secu-
rity and the challenges it represents27 are
often highlighted as major issues for IoT,
alongside scalability and standardiza-
tion. In 2017, FBI Director James Comey
warned, during a senate hearing, of the
threat represented by a botnet taking
control of devices owned by unsuspect-
ing users. Such a botnet can seize con-
trol of devices ranging from connected
dishwashers,b to smart home cameras
and connected toys, not only using
them as a platform to launch cyber-at-
tacks, but also potentially harvesting As concerns grow, legislators across data controller must provide means for
the data such devices collect. the world are taking action in order to end users to determine whether their
In addition to concerns about cyber- protect the public. For example, the re- data is properly handled and means to
security, corporate usage of personal cent EU General Data Protection Regu- effect their rights. Overall, there must
data has seen increased public scrutiny. lation (GDPR) that took effect in May be mechanisms to determine what
A recent focus of concern has been con- 2018,e and the forthcoming ePrivacy data is processed, how, why, and where.
nected home hubs (such as Amazon Alexa Regulationf place strong responsibility Such concerns have drawn re-
and Google Home).c Articles on the topic on data controllers to protect personal searchers to look at means to develop
discussed whether conversations were be- data, and to notify users of security more accountable and transparent sys-
ing constantly recorded and if so, where breaches. The EU commission defines tems.10,24 The problem has also been
those records went. Similarly, the Univer- a Data Controller as the party that de- clearly highlighted by the EU Data
sity of Rennes faced a public backlash af- termines the purposes for which, and Protection Working Party: “As a result
ter revealing its plan to deploy smart-beds the means by which, personal data is of the need to provide pervasive ser-
in its accommodation to detect “abnor- processed (why and how the data is pro- vices in an unobtrusive manner, users
mal” usage patterns.d A clear question cessed). EU regulations further impose might in practice find themselves un-
emerges from IoT-related fears: “How constraints on EU citizens’ data pro- der third-party monitoring. This may
and why is my data being used?” cessing based on location and data type result in situations where the user can
IMAGE BY KOSTENKO M AXIM
(that is, “special category” data falls lose all control on the dissemination
under more stringent constraints). The of his/her data, depending on whether
a See https://bit.ly/2JNx0LZ
b See https://bit.ly/2JIOidc
or not the collection and processing of
c See https://bit.ly/2gY9qKG e See https://bit.ly/2lSJQfO this data will be made in a transparent
d See https://lemde.fr/2HLvEQb f See https://bit.ly/2j4AwzT manner or not.”
Modern computing systems con- and persistent data items, transforma- es this complexity, with potentially ad
tain many components that operate tions applied to those states, and per- hoc and unforeseen interactions be-
as black boxes; they accept inputs and sons (legal or natural) responsible for tween devices and services on top of
generate outputs but do not disclose data and transformations (generally the complex cloud and edge computing
their internal working. Beyond privacy referred to as entities, activities, and infrastructure most IoT services rely on.
concerns, this also limits the ability to agents respectively). The edges repre- One answer to this problem is to
detect cyber-attacks, or more generally sent dependencies between these enti- build applications in “silos” where the
to understand cyber-behavior. Because ties. The analysis of such a graph allows involved parties are known in advance,
of these concerns DARPA, in the U.S., us to understand where, when, how, by but as a side-effect locking-in devices
launched the Transparent Computing whom, and why data has been used.7,9 and services to a single company (for
projectg to explore means to build more An outcome of research on prove- example, the competing smart-home
transparent systems through the use of nance in the cybersecurity space is the offerings by leading technology com-
digital provenance with the particular understanding that the capture mecha- panies). This is far from the IoT vision
aim of identifying advanced persistent nism must provide guarantees of com- of a connected environment, but most
threats. While DARPA’s work is a good pleteness (all events in the system can existing products fall into this catego-
start, we believe there is an urgent need be seen), accuracy (the record is faith- ry. There are obviously major business
to reach much further. In the remain- ful to events) and a well-defined, trust- considerations behind this model, and
der of this Viewpoint, we explore how ed computing base (the threat model is it should be noted that the EU GDPR
provenance can be an answer to some clearly expressed).22 Otherwise, attacks mandates for some form of interoper-
IoT concerns and the challenges faced on the system may be undetected, dis- ability (although it is yet unclear how it
to deploy provenance techniques. simulated by the attacker, or misattrib- should be interpreted12).
uted. We argue that in a highly ad hoc An alternative to such “lock-in”
Digital Provenance and interoperable environment with would be to make devices’ consump-
There is a growing clamor for more mutually untrusted parties, the prove- tion of data transparent and account-
transparency, but straightforward, nance used to empower end users with able. If data is exchanged across de-
widespread technical solutions have control and understanding over data vices, the concerned user should be
yet to emerge. Typical software log re- usage requires similar properties. able to audit its usage. However, in an
cords often prove insufficient to audit environment where arbitrary devices
complex distributed systems as they Who to Trust? could interact (although it must be
fail to capture the complex causality In the IoT environment the number of remembered that EU GDPR requires
relationships between events. Digital involved stakeholders has the potential explicit and informed user consent),
provenance8 is an alternative means to explode exponentially. Traditionally, how can trust be established in the au-
to record system events. Digital prove- a company managed its own server in- dit record? This requires an in-depth
nance is the record of information flow frastructure, maybe with the help of a rethinking of how IoT platforms are
within a computer system in order to subcontractor. The cloud computing designed, potentially exploring the
assess the origin of data (for example, paradigm further increased complex- security-by-design approach based on
its quality or its validity). ity with the involvement of cloud ser- hardware roots of trust13 to provide
The concept first emerged in the da- vice providers (sometimes stacked, for trusted digital enclaves in which be-
tabase research community as a means example, Heroku PaaS on top of the havior can be audited. Some form of
to explain the response to a given Amazon IaaS cloud service), third-party “accountability-by-design” principle
query.16 Provenance research later ex- service providers (for example, Cloud- should also be encouraged, where
panded to address issues of scientific MQTT) and other tenants sharing the transparency and the implementation
reproducibility, notably by providing infrastructure. The IoT further increas- of a trustworthy audit mechanism is a
mechanisms to reconstitute compu- core concern in product design.
tational environments from formal Such solutions have been explored in
records of scientific computations.23 Building transparent the provenance space, for example, by
More recently, provenance has been ex- leveraging SGX properties to provide a
plored within the cybersecurity commu- and auditable strong guarantee of the integrity of the
nity25 as a means to explain intrusions18 systems may be provenance record.4 Similarly, remote
or more recently to detect them.14 attestation techniques leveraging TPM
Provenance records are represented one of the greatest hardware have been proposed6 to guar-
as a directed acyclic graph that shows software engineering antee the integrity of the capture mech-
causality relationships between the anism. However, how to provide such
states of the objects that compose a challenges of the guarantees in an IoT environment, where
complex system. As a consequence, it coming decade. such hardware features may not be avail-
is compatible with automated mathe- able, is a relatively unexplored topic.
matical reasoning. In such a graph, the
vertices represent the state of transient Where Does the Audit Live?
The fully realized IoT vision is of vast
g See https://bit.ly/2Uf5bQY distributed and decentralized systems.
JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 33
viewpoints
If we assume trustworthy provenance and represent the outcome of complex 6. Bates, A.M. et al. Trustworthy whole-system
provenance for the Linux kernel. In Proceedings of the
capture is achievable, the issue of guar- computational workflow.1 USENIX Security Symposium (2015) 319–334.
anteeing that the provenance record can Provenance visualization has been 7. Buneman, P. et al. Why and where: A characterization
of data provenance. In Proceedings of the International
be audited remains. If you are to audit an active research topic for over a de- Conference on Database Theory. Springer, 2001, 316–330.
the processing of personal data, guaran- cade, yet no fully satisfactory solution 8. Carata, L. et al. A primer on provenance. Commun.
ACM 57, 5 (May 2014), 52–60.
tees about the integrity and availability has been proposed. The simplest possi- 9. Cheney, J. et al. Provenance in databases: Why, how,
of the provenance record must exist. If ble visualization is to render the graph, and where. Foundations and Trends in Databases 1, 4
(2009), 379–474.
you agreed to share your daily activity however beyond trivially simple graphs 10. Crabtree, A. et al. Building accountability into the
for research, the activities of insurance such a representation is too complex Internet of Things: The IoT databox model. Journal of
Reliable Intelligent Environments (2018).
companies scraping your data for pos- and dense to be easily understood, even 11. Davidson, S. et al. Provenance views for module
privacy. In Proceedings of the Thirtieth ACM SIGMOD-
sible health risks must not be able to by experts. We go further and suggest SIGACT-SIGART Symposium on Principles of
masquerade as benign research use, that how interpretable such informa- Database Systems. ACM, 2011, 175–186.
12. De Hert, P. et al. The right to data portability in the GDPR:
nor should data collection for political tion is for end users also depends on Towards user-centric interoperability of digital services.
purposes be able to pass as harmless en- educational background, socioeco- Computer Law & Security Review. Elsevier, (2017).
13. Eldefrawy, K. et al. SMART: Secure and minimal
tertainment, as in the Cambridge Ana- nomic environment, and culture. architecture for (establishing dynamic) root of
lytica scandal.h Similarly, the availability In order to make the accountability trust. In Network and Distributed System Security
Symposium 12 (2012), 1–15.
(durability) of the audit record must be and transparency of IoT platforms effec- 14. Han, X. et al. FRAPpuccino: Fault-detection through
guaranteed. There is no point to an au- tive, a better communication medium Runtime Analysis of Provenance. In Proceedings
of the Workshop on Hot Topics in Cloud Computing
dit record if it can simply be deleted. must be provided. An approach often (HotCloud’17). USENIX (2017).
Further, Moyer et al. evaluated the taken is to analyze motifs in the graph 15. Hasan, R. et al. The case of the fake Picasso:
Preventing history forgery with secure provenance.
storage requirements of provenance to extract high-level abstractions (for In Proceedings of the Conference on File and Storage
when used for security purposes in rela- example, Missier et al.20), meaningful to Technologies (FAST’09), (2009), 1–14.
16. Herschel, M. et al. A survey on provenance: What for?
tively modest distributed systems.21 In the average end user. In recent work, it What form? What from? The VLDB Journal—The
such a context, several thousands of was proposed to represent such a high- International Journal on Very Large Data Bases 26, 6
(2017), 881–906.
graph elements can be generated per level abstraction as a comic strip.26 17. Hossain, M.N. et al. Dependence-preserving data
compaction for scalable forensic analysis. In
second and per machine, resulting in Proceedings of the USENIX Security Symposium.
a graph containing billions of nodes to We Need to Care About 18. King, S.T. and Chen, P.M. Backtracking intrusions. ACM
SIGOPS Operating Systems Review 37, 5 (May 2003).
represent system execution over several Digital Provenance 19. Liang, X. et al. Provchain: A blockchain-based data
months. It is unclear how some past re- Building transparent and auditable sys- provenance architecture in cloud environment with
enhanced privacy and availability. In International
search outcomes, for example, detection tems may be one of the greatest software Symposium on Cluster, Cloud and Grid Computing.
of suspicious behavior,2 privacy-aware engineering challenges of the coming IEEE/ACM, (2017), 468–477.
20. Missier, P. et al. ProvAbs: Model, policy, and tooling
provenance11 or provenance integrity,15 decade. As a consequence, digital prove- for abstracting PROV graphs. In Proceedings of the
scale to very large graphs, as such con- nance and its application to cybersecuri- International Provenance and Annotation Workshop.
Springer, 2017, 3–15.
cerns were not evaluated. Similarly, ty and the management of personal data 21. Moyer, T. and Gadepally, V. High-throughput ingest
while blockchain is heralded19 as an in- has become a hot research topic. We of data provenance records into Accumulo. In
Proceedings of the High Performance Extreme
tegrity-preserving means to store prov- have highlighted key active areas of re- Computing Conference (HPEC), IEEE, 2016, 1–6.
enance, it is unclear how well it could ex- search and their associated challenges. 22. Pasquier, T. et al. Runtime analysis of whole system
provenance. In Proceedings of the Conference on
pand to such scale. Several options have It is fundamental for industry practitio- Computer and Communications Security (CCS’18).
been explored to reduce graph size, such ners to understand the threat posed by ACM, 2018.
23. Pasquier, T. et al. If these data could talk. Scientific
as identifying and tracking only sensi- the black-box nature of the IoT, the po- Data 4 (2017), http://www.nature.com/sdata2017114.
24. Pasquier, T. et al. Data provenance to audit compliance
tive data objects5 or performing proper- tential solutions, and the challenges to a with privacy policy in the Internet of Things. Personal
ty-preserving graph compression17 how- practical deployment of those solutions. and Ubiquitous Computing (2018), 333–344.
25. Pohly, D.J. et al. Hi-Fi: Collecting high-fidelity whole-
ever none has yet adequately addressed Accountability-by-design must become system provenance. In Proceedings of the 28th Annual
the scalability challenge. a core objective of IoT platforms. Computer Security Applications Conference. ACM,
2012, 259–268.
26. Schreiber, A. and Struminski, R. Tracing personal data
How to Communicate Information? References
using comics. In Proceedings of the International
Conference on Universal Access in Human-Computer
Means must be developed to commu- 1. Acar, U. et al. A graph model of data and workflow Interaction. Springer, 2017, 444–455.
provenance. In Proceedings of the TAPP’10 Second
nicate about data usage, but also about Conference on Theory and Practice of Provenance,
27. Singh, J. et al. Twenty security considerations for
cloud-supported Internet of Things. IEEE Internet of
the risks of inference from the data. USENIX, 2010. Things Journal 3, 3 (Mar. 2016), 269–284.
2. Allen, M.D. et al. Provenance for collaboration:
Not only must the nature of the data be Detecting suspicious behaviors and assessing trust in
considered, but also other properties information. In Proceedings of the 7th International
Thomas Pasquier (http://tfjmp.org) is a Lecturer
Conference on Collaborative Computing: Networking,
such as the frequency of capture.3 For Applications and Worksharing (CollaborateCom). (Assistant Professor) at the University of Bristol’s Cyber
Security Group, and a visiting scholar at the University of
example, a 100Hz smart-meter read- IEEE, 2011, 342–351.
Cambridge, U.K.
3. Amar, Y. et al. An information theoretic approach to
ing can in some cases indicate what time-series data privacy. In Proceedings of the 1st
David Eyers (https://www.cs.otago.ac.nz/staff/David_Eyers)
television channel is currently being Workshop on Privacy by Design in Distributed Systems.
is an Associate Professor in the Department of Computer
ACM, (2018), 3.
watched; even a daily average reading Science at the University of Otago, New Zealand.
4. Balakrishnan, N. et al. Non-repudiable disk I/O in
could inform about occupancy. Here, untrusted kernels. In Proceedings of the 8th Asia- Jean Bacon (http://www.cl.cam.ac.uk/~jmb25/) is
Pacific Workshop on Systems. ACM, 2017, 24. Professor Emerita of Distributed Systems at the University
it is important to be able to explore 5. Bates, A. et al. Take only what you need: Leveraging of Cambridge, U.K.
mandatory access control policy to reduce provenance
storage costs. In Proceedings of the Conference on
h See https://nyti.ms/2HH74vA Theory and Practice of Provenance (2015), USENIX, 7–7. Copyright held by authors.
Article development led by
queue.acm.org
Garbage
Collection
as a Joint
Venture
MA N Y POPU LAR PRO G RAMMI NG languages are executed and premature freeing results in dan-
gling pointers.
on top of virtual machines (VMs) that provide Virtual machines for managed lan-
critical infrastructure such as automated memory guages may be embedded into larger
management using garbage collection. Examples software systems that are implemented
in a different, sometimes unmanaged,
include dynamically typed programming languages programming language, where pro-
such as JavaScript and Python, as well as static ones grammers are responsible for releasing
memory that is no longer needed. An
like Java and C#. For such languages the garbage example of such a heterogenous soft-
collector periodically traces through objects on the ware system is Google’s Chrome Web
application heap to determine which objects are live browser where the high-performance
V8 JavaScript VM (https://v8.dev/) is em-
and should be kept or dead and can be reclaimed. bedded in the Blink rendering engine
The garbage collector is said to manage the that is in charge of rendering a website.
Blink renders these pages by interpret-
application memory, which means the programming ing the document object model (DOM;
language is managed. The main advantage of https://www.w3.org/TR/WD-DOM/intro-
managed languages is that developers do not have duction.html) of a website, which is a
cross-platform language-independent
to reason about object lifetimes and free objects representation of the tree structure de-
manually. Forgetting to free objects leaks memory, fined through HTML.
V8 and Blink use mark-sweep-com- roots in V8 and vice versa. This creates While the cycle problem can be
pact garbage collectors where a single the problem of reference cycles across avoided by unifying the memory-man-
garbage-collection cycle consists of components, which is analogous to agement systems of two components,
three phases: marking, where live ob- regular reference cycles1 within a sin- it may still be desirable to manage the
JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 37
practice
the hood, it is crucial that the APIs for be manually annotated with a method ing, which means that marking is di-
these abstractions are properly encap- that describes the body of the class, vided into steps during which objects
sulated for Web developers who use including any references to other man- are marked for only a small amount of
HTML and JavaScript, including pre- aged objects. Since Blink was already time (for example, 1ms).
venting memory leaks when properly garbage-collected before introducing The application is free to change ob-
used. To investigate memory leaks in CCT, only minor adjustments to this ject references between the steps. This
Web pages, developers need tools that method were required across the ren- means that the application may hide a
allow them to reason seamlessly about dering codebase. reference to an unmarked object in an
the connectivity of objects spanning Chrome strives to provide smooth already-marked object, which would
both V8 and Blink heaps. user experiences, updating the screen result in premature collection of a live
at 60fps (frames per second), leaving object. Incremental marking requires
Cross-Component Tracing V8 and Blink around 16.6 millisec- a garbage collector to keep the mark-
We propose CCT as a way to tackle the onds to render a frame. Since marking ing state consistent by preserving the
general problem of reference cycles large heaps may take hundreds of mil- strong tri-color-marking invariant.8
across component boundaries. For liseconds, both V8 and Blink employ This invariant states that fully marked
CCT, the garbage collectors of all in- a technique called incremental mark- objects are allowed to point only to
volved components are extended to
allow tracing into a different compo- Figure 1. JavaScript example interacting with the DOM.
nent, managing objects of potentially <!DOCTYPE html>
different programming languages. <html>
<body><script>
CCT uses the garbage collector of one function fetchContent(callback) {
component as the master tracer to com- // Emulate network request and content creation.
pute the full transitive closure of live setTimeout(callback, 1000);
}
objects to break cycles. function run() {
Other components assist by provid- const loadingBar = document.createElement(“div”);
document.body.appendChild(loadingBar);
ing a remote tracer that can traverse fetchContent(() => {
the objects of the component when const content = document.createElement(“div”);
document.body.replaceChild(content, loadingBar);
requested by the master tracer. The content.parent = document.body;
system can then be treated as one });
managed heap. As a consequence, }
document.addEventListener(“DOMContentLoaded”, run);
the simple algorithm of CCT can be </script></body>
extended to allow moving collectors </html>
and incremental or concurrent mark-
ing as needed by just following exist-
ing garbage collection principles.8 The Figure 2. Object graph spanning JavaScript and the DOM.
pseudocode of the master and remote
tracer algorithms is available in our full
V8 blink
research article.3
For Chrome we developed a version root
of cross-component tracing where the
master tracer for JavaScript objects
and the remote tracer for C++ objects document HTMLDocument
are provided by V8 and Blink, respec-
tively. This way V8 can trace through
the C++ DOM upon doing a garbage
collection, effectively breaking cycles body HTMLBodyElement
on the V8 and Blink boundary. In this
system, Blink garbage collections deal
with only the C++ objects and treat the
incoming cross-component references div
from V8 as roots. This way, subsequent content HTMLDivElement
JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 39
practice
objects that are also fully marked or rent marking on a background thread by V8’s root object are marked black.
stashed somewhere for processing. V8 this way while relying on incremental Subsequently, any unreachable objects
and Blink preserve the marking invari- tracing in Blink.5 (loadingBar, in this example) are re-
ant using a conservative Dijkstra-style To make this concrete, Figure 3 il- claimed by the garbage collector. Note
write barrier6 that ensures that writing lustrates CCT where V8 traces and that from V8’s point of view, there is no
a value into an object also marks the marks objects in JavaScript, as well d ifference between the div elements
value. In fact, V8 even provides concur- as C++. Objects transitively reachable content and loadingBar, and only
CCT makes it clear which object can
Figure 3. Cross-component garbage collection. be reclaimed by V8’s garbage collec-
tor. Once the unreachable V8 object is
gone, any subsequent garbage collec-
V8 blink
tions in Blink will not see a root for the
root corresponding HTMLDivElement and
reclaim the other half of the wrapper-
wrappable pair.
document HTMLDocument In Chrome, CCT replaced its prede-
cessor, called object grouping, in ver-
sion 57. Object grouping was based on
over-approximating liveness across
body HTMLBodyElement component boundaries by keeping
all wrappers and wrappables alive in
a given DOM tree as long as a single
parent
wrapper was held alive through Java-
div Script. This assumption was reason-
HTMLDivElement
content able at the time it was implemented,
when modification of the DOM from
wrappers occurred infrequently.
However, the over-approximation
div
loadingBar
HTMLDivElement had two major shortcomings: It kept
more memory alive than needed,
reclaimed on which in times of ever-growing Web
V8 garbage applications increased already strong
collection
memory pressure in the browser;
and, the original algorithm was not
designed for incremental processing,
Figure 4. Leaking the callback. which, compared with CCT, resulted
in longer garbage-collection pause
function fetchContent(callback) {
// Emulate network request and content creation.
times.
setTimeout(callback, 1000); Incremental CCT as implemented
fetchContent.internalState = callback; today in Chrome eliminates those
}
problems by providing a much bet-
ter approximation by computing live-
ness of objects through reachability
Figure 5. Retaining path of the leaking DIV element. and by enabling incremental process-
ing. The detailed performance analy-
sis can be found in the main research
paper.3 We are currently working on
concurrent marking of the Blink C++
heap and on integrating CCT into
such a scheme.
Debugging
Memory-leak bugs are a widespread
problem haunting Web applications
today.7 Powerful language constructs
such as closures make it easy for a Web
developer to accidentally extend the
lifetimes of JavaScript and DOM ob-
jects, resulting in higher memory usage
than necessary. As a concrete example,
let’s assume that the fetchContent that such cycles are eventually col- that allows the systems on top to stay
function from Figure 1 keeps, perhaps lected. WebKit, the engine running as flexible as needed.
because of a bug, an internal reference inside Safari, uses reference counting CCT is implemented not only in
to the provided callback, as shown in for the C++ DOM with an additional Chrome, but also in other software sys-
Figure 4. system that computes liveness across tems that use V8 and Chrome, such as
Without knowing the implemen- the wrapper/wrappable boundary in the popular Opera Web browser and
tation of the fetchContent func- the final pause of a garbage-collection Electron. Cobalt, a high-performance,
tion, a Web developer observes that cycle. Unsurprisingly, all major brows- small-footprint platform providing a
the loadingBar element from the ers have mechanisms to deal with subset of HTML5, CSS, and JavaScript
previous example is not reclaimed by these kinds of cycles, as memory leaks used for embedded devices such as
the garbage collector. Can debugging in longer-running websites would oth- TVs, implemented cross-component
tools help track down why the element erwise be inevitable and would observ- tracing inspired by our system to man-
is leaking? ably impact browser performance. age its memory.
The tracing infrastructure needed More interestingly, though, we are
for cross-component garbage collec- not aware of other sophisticated sys-
Related articles
tion can be applied to improve mem- tems integrating VMs that provide on queue.acm.org
ory debugging. Chrome DevTools cross-component memory manage-
Idle-Time Garbage-Collection Scheduling
uses the infrastructure to capture and ment. While VMs often provide bridg-
Ulan Degenbaev et al.
visualize the object graph spanning es for integration in other systems, https://queue.acm.org/detail.cfm?id=2977741
JavaScript and DOM objects. The tool such as Java Native Interface (JNI) and
Real-time Garbage Collection
allows Web developers to query why NativeScript, cross-component refer- David F. Bacon
a particular object is not reclaimed ences require manual management in https://queue.acm.org/detail.cfm?id=1217268
by the garbage collector. It presents all of them. Developers using those sys- Leaking Space
the answer in the form of a retaining tems must manually create and destroy Neil Mitchell
path, which runs from the object to links that can form cycles. This is error https://queue.acm.org/detail.cfm?id=2538488
the garbage-collection root. Figure 5 prone and can lead to the aforemen-
shows the retaining path for the leak- tioned problems. References
1. Bacon, D.F. and Rajan, V.T. Concurrent cycle collection
ing loadingBar element. The path in reference counted systems. In Proceedings of the
shows that the leaking DOM element Conclusion 15th European Conf. Object-Oriented Programming.
Springer-Verlag, London, U.K., 2001, 207–235; https://
is captured by the loadingBar vari- Cross-component tracing is a way to doi.org/10.1007/3-540-45337-7_12.
able in the environment (called con- solve the problem of reference cycles 2. Chambers, C., Ungar, D. and Lee, E. An efficient
implementation of SELF, a dynamically-typed
text in V8) of an anonymous closure, across component boundaries. This object-oriented language based on prototypes.
which is retained by the internal- problem appears as soon as compo- In Proceedings of the Conf. Object-Oriented
Programming Systems, Languages and Applications.
State field of the fetchContent nents can form arbitrary object graphs ACM SIGPLAN, 1989, 49–70; https://dl.acm.org/
function. By inspecting each node of with nontrivial ownership across API citation.cfm?doid=74877.74884.
3. Degenbaev, U. et al. Cross-component
the path, the Web developer can pin- boundaries. An incremental version of garbage collection. In Proceedings of the
point the source of the leak. Thanks CCT is implemented in V8 and Blink, ACM on Programming Languages 2, OOPSLA
Article 151, 2018; https://dl.acm.org/citation.
to the cross-component tracing, the enabling effective and efficient rec- cfm?doid=3288538.3276521.
path seamlessly crosses the DOM and lamation of memory in a safe man- 4. Degenbaev, U., Filippov, A., Lippautz, M. and Payer, H.
Tracing from JS to the DOM and back again. V8, 2018;
JavaScript boundary.4 ner—without introducing dangling https://v8.dev/blog/tracing-js-dom.
5. Degenbaev, U., Lippautz, M. and Payer, H. Concurrent
pointers that could lead to program marking in V8. V8, 2018; https://v8.dev/blog/
Reclaiming Memory in Other crashes or security vulnerabilities in concurrent-marking.
6. Dijkstra, E.W., Lamport, L., Martin, A.J., Scholten, C.S.
Heterogeneous Systems Chrome or Chromium-derived brows- and Steffens, E.F.M. On-the-fly garbage collection:
Web browsers are particularly inter- ers. The same tracing system is reused An exercise in cooperation. Commun. ACM 21, 11
(Nov. 1978), 966–975; https://dl.acm.org/citation.
esting systems, as all major browser by Chrome DevTools to visualize re- cfm?doid=359642.359655.
engines separate DOM and JavaScript taining paths of objects independent 7. Hablich, M. and Payer, H. Lessons learned from the
memory roadshow; https://bit.ly/2018-memory-
objects in a similar way (that is, by of whether they are managed in C++ or roadshow.
providing different heaps for those JavaScript. 8. Jones, R., Hosking, A. and Moss, E. The Garbage
Collection Handbook: The Art of Automatic Memory
objects). Similar to Blink and V8, all Note, however, that CCT comes with Management. Chapman & Hall, 2012.
those browsers encode their DOM in significant implementation overhead,
C++ and must rely on a custom object as it requires implementations of trac- Ulan Degenbaev is a software engineer at Google, working
on the garbage collector of the V8 JavaScript engine.
model for JavaScript. All Blink-de- ers in each component. Ultimately,
rived systems (for example, Chrome, implementers need to weigh the effort Michael Lippautz is a software engineer at Google, where
he works on garbage collection for the V8 JavaScript
Opera, and Electron) rely on CCT to of either avoiding cycles by enforcing virtual machine and the Blink rendering engine. Previously,
handle cross-component references. restrictions on their systems or imple- he worked on Google’s Dart virtual machine.
The Gecko rendering engine that menting a mechanism to reclaim cy- Hannes Payer is a software engineer at Google, where he
works on the V8 JavaScript virtual machine. Previously, he
powers Firefox uses reference count- cles, such as CCT. Chrome was already worked on Google’s Dart virtual machine and various Java
ing to manage DOM objects. An ad- equipped with garbage collectors in V8 virtual machines.
ditional incremental cycle collector1 and Blink, and thus we chose to imple-
that wakes up periodically ensures ment a generic solution such as CCT Copyright held by authors/owners.
JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 41
practice
DOI:10.1145/ 3316778
˲˲ Build safety. Create an environment
Article development led by
queue.acm.org
where people feel safe and secure.
˲˲ Share vulnerability. When people
are willing to take risks, it can drive co-
Build safety, share vulnerability, operation and build trust.
and establish purpose. ˲˲ Establish purpose. The team should
align around common goals and val-
BY KATE MATSUDAIRA ues, with a clear path forward.
The book is filled with many exam-
How to Create
ples and ideas, but in my experience,
I have seen that what works for one
team will not work for another. That is
one of the reasons leadership is com-
a Great Team
plex and difficult.
You are always working with differ-
ent variables—different teams, differ-
ent companies, different goals. And
Culture
yet team culture is one part of the job
that great leaders never ignore. So, how
do the best leaders create team culture
wherever they go?
(and Why
See the Role You Play
in Team Culture
As a leader, it is your responsibility to
set the culture for the team. I am sure
It Matters)
you have heard the phrase “lead by ex-
ample,” and that is because when peo-
ple aren’t sure what is acceptable, they
look to their leaders for guidance.
You have surely been in the situa-
tion where you have seen your man-
ager staying late at the office, and as
a result, you might have stayed just a
I N MY CAREER leading teams, I have worked with large little longer. On the other hand, if you
organizations (more than 1,000 people) and super- frequently saw your boss taking two-
small teams (a startup with just two people). I have hour lunches, you might not be in such
a hurry to get back to the office when
seen that the best teams have one thing in common: your friend stops by to go to lunch.
a strong team culture. Every day, people are looking for
signals in their environment about
We all know what it is like to be a part of a great what is the norm. As a leader, it is part
team—when you enjoy coming together and the of your job to set the example for those
energy is electric. There is something special that around you.
You want to create a culture where
happens when the team becomes greater than the people are engaged, cooperative, and
sum of the individuals. excited. To do this, you need to be de-
liberate in your actions. For example, if
I was really inspired by this topic recently when you want to create a culture of psycho-
I read Daniel Coyle’s book The Culture Code.1 The logical safety, where people can speak
author shares a lot of research (and I do love data) up and take risks, it is important that
you do not accept or participate in neg-
about what makes a great team. He boils it down to ativity. Research has shown that one
a few key elements: bad apple or toxic employee can bring
to me to complain about another team ˲˲ Build trust. Create opportunities team exist? Think beyond the function
member, instead of sharing that feed- (for example, summits or meetings) for that the team serves for the organiza-
back, I would coach the complainer people to build trust with one another tion as a whole (for example, coding or
to tell the other person directly. If the and get to know each another as peo- design). Why are we a team? The answer
complaint was about how a peer was ple—not just as coworkers. to this question is not always clear.
JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 43
practice
Other questions to consider: What Team Structure Becomes How can you make people feel like
are the advantages of being a team? Team Culture they are valued and important parts of
What can we do because we are work- Team culture is not just wearing the the team?
ing together? Maybe it is so team same t-shirts at the company picnic.
members can learn from each other The way you do things every day is what Let the Culture Expand
and do better work; there is more builds your culture. From the Top Down
learning when there is collaboration. So, while streamlining the way your As leader of the team, you have sig-
Maybe it is decision making; we make team formats their reports might not nificant influence over your team’s
smarter decisions when we have mul- feel like it has much to do with team culture. You can institute policies and
tiple viewpoints. Maybe it is about culture, it does. It is a way of steering procedures that help make your team
resources; we can do more with com- your team to work together, by prioritiz- happy and productive, monitor team
bined skills and time. ing the values that your team supports. successes, and continually improve
Culture is in the everyday. It is the the team.
How to Create Cultural small actions that you and everyone on Another important part of team cul-
Touchpoints Around Your Values your team takes on a daily basis—the ture, however, is helping people feel
Once you know the value of your way they speak to each other, the way de- they are a part of creating it. How can
team, then you can start building in cisions get made, the way they run meet- you expand the job of creating a culture
elements that support its values. ings—that make up your team culture. to other team members?
Let’s say you decide that one of the I have seen many amazing examples Look for opportunities to delegate
values of your team is peer mentor- of culture-building throughout my ca- whenever you can. If a holiday is com-
ing; in other words, one of the rea- reer. Here are just a few more ideas that ing up, maybe you could ask a team
sons your team exists is so that its might inspire you as you build your member to help organize a team din-
members can do better work by learn- team culture: ner. Look for people with unique per-
ing from one another. Now, how do ˲˲ Weekly demo meetings. Have some- spectives (who maybe aren’t heard
you make that happen on your team? one from the team share a recent ac- from as often as others) and give them
You cannot just say “We learn complishment. This could be a big a platform to share.
from each other” during a meeting thing, or even something as small as This is where truly great leadership
and make it happen. You have to in- changing a button color on the web- comes from. You establish a culture
stitute processes that make this a site. This creates a culture of sharing that enables your team to be the best it
simple part of daily life on your team. work so that people feel more collabor- can be, and then you allow the team to
Think about which kinds of forums ative even outside the meeting setting, take that culture and run with it.
you can set up to help people learn since they know what other people are How amazing could your team be
from one another. Do you want to working on. with just a few adjustments?
encourage questions in team chat? ˲ ˲ Teaching slots for every team
Set up a code-review process? Estab- member. At every team meeting, have
Related articles
lish a cadence of brown bags to share people sign up to share something on queue.acm.org
lessons learned? Read whitepapers they’ve learned recently or teach some-
High-Performance Team
and discuss them as a group? What thing to the team. This is a great way for
Philip Beevers
makes sense for your team? people who have been to conferences https://queue.acm.org/detail.cfm?id=1117402
Now let’s say that another value of or read interesting books to share that
Culture Surprises in Remote Software
your team is decision-making; your knowledge, and it helps give everyone on Development Teams
team exists because everyone’s in- the team a voice (even those who don’t Judith S. Olson and Gary M. Olson
put helps to make smarter decisions normally speak up during meetings). https://queue.acm.org/detail.cfm?id=966804
about what to work on and how to ˲˲ Cupcakes for launches. Mark every
Stand and Deliver: Why I Hate Stand-Up
work on it. We all know that the more team win by bringing people togeth- Meetings
people are involved in a decision, the er—literally together, around a plate Phillip A. Laplante
more complicated it becomes to get of cupcakes, instead of just via email. https://queue.acm.org/detail.cfm?id=957730
a clear answer. So, how do you ben- This creates a culture of celebration,
References
efit from getting everyone’s input where people’s successes are noticed 1. Coyle, D. The Culture Code. Bantam, 2018; http://
without becoming a team that can and rewarded, and where the whole danielcoyle.com/the-culture-code/.
2. Felps, W., Mitchell, T., and Byington, E. How, when,
get nothing done because no one team celebrates together. and why bad apples spoil the barrel: Negative group
can agree? Some of the decisions you make members and dysfunctional groups. Research in
Organizational Behavior 27 (2006), 175–222.
The solution might be to have all will feel big and some will feel small.
decision-making done in the same But whether it’s as huge as develop- Kate Matsudaira (katemats.com) is an experienced
way. For example, you might institute a ing an online help tool for your team technology leader. She has worked at Microsoft and
Amazon and successful startups before starting her own
format for all reports. That way, the in- or as small as sharing cupcakes after a company, Popforms, which was acquired by Safari Books.
put from various individuals or depart- big win, the effect is the same. Culture
ments will all come to you in the exact comes from shared experiences. The
same way, so you can quickly parse the what doesn’t matter nearly as much as Copyright held by author/owner.
information and make a decision. the why. Publication rights licensed to ACM.
Article development led by
queue.acm.org
Research for
Practice:
Troubling
Trends in
Machine-Learning
Scholarship
COLLECTIVELY, MACHINE LEARNING
(ML) researchers are engaged in the
creation and dissemination
of knowledge about data-driven
algorithms. In a given paper,
researchers might aspire to any subset of the following
goals, among others: to theoretically characterize what
is learnable; to obtain understanding stronger conclusions supported by
through empirically rigorous experi- evidence; describe empirical inves-
ments; or to build a working system tigations that consider and rule out
that has high predictive accuracy. alternative hypotheses; make clear
While determining which knowledge the relationship between theoretical
warrants inquiry may be subjec- analysis and intuitive or empirical
tive, once the topic is fixed, papers claims; and use language to empower
are most valuable to the community the reader, choosing terminology to
when they act in service of the reader, avoid misleading or unproven conno-
creating foundational knowledge and tations, collisions with other defini-
communicating as clearly as possible. tions, or conflation with other related
What sorts of papers best serve their but distinct concepts.
readers? Ideally, papers should ac- Recent progress in machine learn-
complish the following: provide intu- ing comes despite frequent depar-
ition to aid the reader’s understand- tures from these ideals. This install-
ing but clearly distinguish it from ment of Research for Practice focuses
JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 45
practice
on the following four patterns that ap- imental standards have eroded trust exploration predicated on intuitions
pear to be trending in ML scholarship: in the discipline’s authority.33 The that have yet to coalesce into crisp
˲˲ Failure to distinguish between ex- current strength of machine learn- formal representations. Speculation
planation and speculation. ing owes to a large body of rigorous is a way for authors to impart intu-
˲˲ Failure to identify the sources of research to date, both theoretical and itions that may not yet withstand
empirical gains (for example, empha- empirical. By promoting clear scien- the full weight of scientific scrutiny.
sizing unnecessary modifications to tific thinking and communication, Papers often offer speculation in the
neural architectures when gains actu- our community can sustain the trust guise of explanations, however, which
ally stem from hyperparameter tuning). and investment it currently enjoys. are then interpreted as authoritative
˲˲ “Mathiness”—the use of math- Disclaimers. This article aims to in- because of the trappings of a scientif-
ematics that obfuscates or impresses stigate discussion, answering a call for ic paper and the presumed expertise
rather than clarifies (for example, by papers from the International Confer- of the authors.
confusing technical and nontechnical ence on Machine Learning (ICML) For instance, in a 2015 paper, Ioffe
concepts). Machine Learning Debates workshop. and Szegedy18 form an intuitive theory
˲˲ Misuse of language (for example, While we stand by the points repre- around a concept called internal co-
by choosing terms of art with collo- sented here, we do not purport to of- variate shift. The exposition on inter-
quial connotations or by overloading fer a full or balanced viewpoint or to nal covariate shift, starting from the
established technical terms). discuss the overall quality of science abstract, appears to state technical
While the causes of these patterns in ML. In many aspects, such as re- facts. Key terms are not made crisp
are uncertain, possibilities include producibility, the community has ad- enough, however, to assume a truth
the rapid expansion of the communi- vanced standards far beyond what suf- value conclusively. For example, the
ty, the consequent thinness of the re- ficed a decade ago. paper states that batch normaliza-
viewer pool, and the often-misaligned Note that these arguments are made tion offers improvements by reducing
incentives between scholarship and by us, against us—insiders offering changes in the distribution of hidden
short-term measures of success (for a critical introspective look—not as activations over the course of train-
example, bibliometrics, attention, sniping outsiders. The ills identified ing. By which divergence measure is
and entrepreneurial opportunity). here are not specific to any individual this change quantified? The paper
While each pattern offers a corre- or institution. We have fallen into these never clarifies, and some work sug-
sponding remedy (don’t do it), this patterns ourselves, and likely will again gests that this explanation of batch
article also makes suggestions on how in the future. Exhibiting one of these normalization may be off the mark.37
the community might combat these patterns doesn’t make a paper bad, nor Nevertheless, the speculative expla-
troubling trends. does it indict the paper’s authors; how- nation given by Ioffe and Szegedy has
As the impact of machine learn- ever, all papers could be made stronger been repeated as fact—for example, in
ing widens, and the audience for re- by avoiding these patterns. a 2015 paper by Noh, Hong, and Han,31
search papers increasingly includes While we provide concrete ex- which states, “It is well known that a
students, journalists, and policy-mak- amples, our guiding principles are to deep neural network is very hard to
ers, these considerations apply to this implicate ourselves; and to select pref- optimize due to the internal-covari-
wider audience as well. By communi- erentially from the work of better-es- ate-shift problem.”
cating more precise information with tablished researchers and institutions We have been equally guilty of spec-
greater clarity, better ML scholarship that we admire, to avoid singling out ulation disguised as explanation. In a
could accelerate the pace of research, junior students for whom inclusion 2017 paper with Koh and Liang,42 I (Ja-
reduce the on-boarding time for new in this discussion might have conse- cob Steinhardt) wrote that “the high
researchers, and play a more construc- quences and who lack the opportunity dimensionality and abundance of ir-
tive role in public discourse. to reply symmetrically. We are grateful relevant features … give the attacker
Flawed scholarship threatens to to belong to a community that pro- more room to construct attacks,”
mislead the public and stymie future vides sufficient intellectual freedom without conducting any experiments
research by compromising ML’s in- to allow the expression of critical per- to measure the effect of dimensional-
tellectual foundations. Indeed, many spectives. ity on attackability. In another paper
of these problems have recurred cy- with Liang from 2015,41 I (Steinhardt)
clically throughout the history of AI Troubling Trends introduced the intuitive notion of
(artificial intelligence) and, more Each subsection that follows de- coverage without defining it, and
broadly, in scientific research. I n scribes a trend; provides several exam- used it as a form of explanation (for
1976, Drew McDermott26 chastised ples (as well as positive examples that example, “Recall that one symptom
the AI community for abandoning resist the trend); and explains the con- of a lack of coverage is poor esti-
self-discipline, warning prophetically sequences. Pointing to weaknesses in mates of uncertainty and the inabil-
“if we can’t criticize ourselves, some- individual papers can be a sensitive ity to generate high-precision predic-
one else will save us the trouble.” Sim- topic. To minimize this, the examples tions.” Looking back, we desired to
ilar discussions recurred throughout are short and specific. communicate insufficiently fleshed-
the 1980s, 1990s, and 2000s. In other Explanation vs. speculation. Re- out intuitions that were material to
fields, such as psychology, poor exper- search into new areas often involves the work described in the paper and
were reticent to label a core part of the provements), when in fact they did not
argument as speculative. do enough (by not performing proper
In contrast to these examples, ablations). Moreover, this practice
Srivastava et al.39 separate specula- misleads readers to believe that all of
tion from fact. While this 2014 paper,
which introduced dropout regulariza- Empirical the proposed changes are necessary.
In 2018, Melis, Dyer, and Blunsom27
tion, speculates at length on connec-
tions between dropout and sexual
study aimed at demonstrated that a series of pub-
lished improvements in language
reproduction, a designated “Motiva- understanding modeling, originally attributed to
tion” section clearly quarantines this
discussion. This practice avoids con-
can be illuminating complex innovations in network ar-
chitectures, were actually the result
fusing readers while allowing authors even absent a of better hyperparameter tuning. On
to express informal ideas.
In another positive example, Yo-
new algorithm. equal footing, vanilla long short-term
memory (LSTM) networks, hardly
shua Bengio2 presents practical guide- modified since 1997, topped the
lines for training neural networks. leaderboard. The community might
Here, the author carefully conveys have benefited more by learning the
uncertainty. Instead of presenting details of the hyperparameter tun-
the guidelines as authoritative, the ing without the distractions. Similar
paper states: “Although such recom- evaluation issues have been observed
mendations come … from years of for deep reinforcement learning17 and
experimentation and to some extent generative adversarial networks.24 See
mathematical justification, they Sculley et al.38 for more discussion of
should be challenged. They consti- lapses in empirical rigor and result-
tute a good starting point … but very ing consequences.
often have not been formally validat- In contrast, many papers perform
ed, leaving open many questions that good ablation analyses, and even
can be answered either by theoretical retrospective attempts to isolate the
analysis or by solid comparative exper- source of gains can lead to new dis-
imental work.” coveries. Furthermore, ablation is nei-
Failure to identify the sources of ther necessary nor sufficient for un-
empirical gains. The ML peer-review derstanding a method, and can even
process places a premium on techni- be impractical given computational
cal novelty. Perhaps to satisfy review- constraints. Understanding can also
ers, many papers emphasize both come from robustness checks (as in
complex models (addressed here) and Cotterell et al.,9 which discovers that
fancy mathematics (to be discussed in existing language models handle in-
“Mathiness” section). While complex flectional morphology poorly), as well
models are sometimes justified, em- as qualitative error analysis.
pirical advances often come about in Empirical study aimed at under-
other ways: through clever problem standing can be illuminating even
formulations, scientific experiments, absent a new algorithm. For example,
optimization heuristics, data-prepro- probing the behavior of neural net-
cessing techniques, extensive hyper- works led to identifying their suscep-
parameter tuning, or applying exist- tibility to adversarial perturbations.44
ing methods to interesting new tasks. Careful study also often reveals limi-
Sometimes a number of proposed tations of challenge datasets while
techniques together achieve a signifi- yielding stronger baselines. A 2016
cant empirical result. In these cases, paper by Chen, Bolton, and Manning6
it serves the reader to elucidate which studied a task designed for reading
techniques are necessary to realize the comprehension of news passages and
reported gains. found that 73% of the questions can
Too frequently, authors propose be answered by looking at a single sen-
many tweaks absent proper ablation tence, while only 2% required looking
studies, obscuring the source of em- at multiple sentences (the remaining
pirical gains. Sometimes, just one of 25% of examples were either ambigu-
the changes is actually responsible for ous or contained coreference errors).
the improved results. This can give the In addition, simpler neural networks
false impression that the authors did and linear classifiers outperformed
more work (by proposing several im- complicated neural architectures that
JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 47
practice
had previously been evaluated on this denced by the paper introducing the
task. In the same spirit, Zellers et al.45 Adam optimizer.19 In the course of
analyzed and constructed a strong introducing an optimizer with strong
baseline for the Visual Genome Scene empirical performance, it also offers a
Graphs dataset in their 2018 paper.
Mathiness. When writing a paper When mathematical theorem regarding convergence in the
convex case, which is perhaps unnec-
early in my Ph.D. program, I (Zach-
ary Lipton) received feedback from an
and natural- essary in an applied paper focusing on
non-convex optimization. The proof
experienced post-doc that the paper language was later shown to be incorrect.35
needed more equations. The post-
doc wasn’t endorsing the system but
statements are A second mathiness issue is put-
ting forth claims that are neither
rather communicating a sober view mixed without a clearly formal nor clearly informal.
of how reviewing works. More equa-
tions, even when difficult to decipher,
clear accounting of For example, Dauphin et al.11 argued
that the difficulty in optimizing neu-
tend to convince reviewers of a paper’s their relationship, ral networks stems not from local
technical depth.
Mathematics is an essential tool both the prose minima but from saddle points. As
one piece of evidence, the work cites
for scientific communication, impart- and the theory a statistical physics paper by Bray and
ing precision and clarity when used
correctly. Not all ideas and claims are can suffer. Dean5 on Gaussian random fields and
states that in high dimensions “all
amenable to precise mathematical local minima [of Gaussian random
description, however, and natural lan- fields] are likely to have an error very
guage is an equally indispensable tool close to that of the global minimum.”
for communicating, especially about (A similar statement appears in the
intuitive or empirical claims. related work of Choromanska et al.7)
When mathematical and natural- This appears to be a formal claim,
language statements are mixed with- but absent a specific theorem it is dif-
out a clear accounting of their relation- ficult to verify the claimed result or
ship, both the prose and the theory can to determine its precise content. Our
suffer: problems in the theory can be understanding is that it is partially a
concealed by vague definitions, while numerical claim that the gap is small
weak arguments in the prose can be for typical settings of the problem pa-
bolstered by the appearance of techni- rameters, as opposed to a claim that
cal depth. We refer to this tangling of the gap vanishes in high dimensions.
formal and informal claims as mathi- A formal statement would help clarify
ness, following economist Paul Romer, this. Note that the broader interesting
who described the pattern like this: point in Dauphin et al. that minima
“Like mathematical theory, mathiness tend to have lower loss than saddle
uses a mixture of words and symbols, points is more clearly stated and em-
but instead of making tight links, it pirically tested.
leaves ample room for slippage be- Finally, some papers invoke theory
tween statements in natural language in overly broad ways or make passing
versus formal language.”36 references to theorems with dubious
Mathiness manifests in several pertinence. For example, the no-free-
ways. First, some papers abuse math- lunch theorem is commonly invoked as
ematics to convey technical depth—to a justification for using heuristic meth-
bulldoze rather than to clarify. Spuri- ods without guarantees, even though
ous theorems are common culprits, the theorem does not formally preclude
inserted into papers to lend authori- guaranteed learning procedures.
tativeness to empirical results, even While the best remedy for mathi-
when the theorem’s conclusions do ness is to avoid it, some papers go
not actually support the main claims further with exemplary exposition. A
of the paper. I (Steinhardt) was guilty 2013 paper by Bottou et al.4 on coun-
of this in a 2015 paper with Percy Li- terfactual reasoning covered a large
ang,40 where a discussion of “staged amount of mathematical ground in a
strong Doeblin chains” had limited down-to-earth manner, with numer-
relevance to the proposed learning ous clear connections to applied em-
algorithm but might confer a sense of pirical problems. This tutorial, writ-
theoretical depth to readers. ten in clear service to the reader, has
The ubiquity of this issue is evi- helped to spur work in the burgeoning
community studying counterfactual to better decisions,” as explained by works, however, generative model
reasoning for ML. Dave Gershgorn,13 despite demon- imprecisely refers to any model that
Misuse of language. There are three strations that these networks rely on produces realistic-looking structured
common avenues of language misuse spurious correlations, (for example, data. On the surface, this may seem
in machine learning: suggestive defi- misclassifying “Asians dressed in red” consistent with the p(x) definition, but
nitions, overloaded terminology, and as ping-pong balls, reported by Stock it obscures several shortcomings—for
suitcase words. and Cisse43). example, the inability of GANs (gen-
Suggestive definitions. In the first av- Deep-learning papers are not the erative adversarial networks) or VAEs
enue, a new technical term is coined sole offenders; misuse of language (variational autoencoders) to perform
that has a suggestive colloquial mean- plagues many subfields of ML. Lip- conditional inference (for example,
ing, thus sneaking in connotations ton, Chouldechova, and McAuley23 sampling from p(x2x1) where x1 and
without the need to argue for them. discuss how the recent literature on x2 are two distinct input features).
This often manifests in anthropomor- fairness in ML often overloads termi- Bending the term further, some dis-
phic characterizations of tasks (read- nology borrowed from complex legal criminative models are now referred
ing comprehension and music compo- doctrine, such as disparate impact, to to as generative models on account of
sition) and techniques (curiosity and name simple equations expressing producing structured outputs, a mis-
fear—I (Zachary) am responsible for particular notions of statistical par- take that I (Lipton), too, have made.
the latter). A number of papers name ity. This has resulted in a literature Seeking to resolve the confusion and
components of proposed models in where “fairness,” “opportunity,” and provide historical context, Mohamed
a manner suggestive of human cog- “discrimination” denote simple sta- and Lakshminarayanan30 distinguish
nition (for example, thought vectors tistics of predictive models, confusing between prescribed and implicit gen-
and the consciousness prior). Our goal researchers who become oblivious to erative models.
is not to rid the academic literature the difference and policymakers who Revisiting batch normalization,
of all such language; when properly become misinformed about the ease Ioffe and Szegedy18 described covari-
qualified, these connections might of incorporating ethical desiderata ate shift as a change in the distribution
communicate a fruitful source of in- into ML. of model inputs. In fact, covariate shift
spiration. When a suggestive term is Overloading technical terminology. refers to a specific type of shift, where
assigned technical meaning, however, A second avenue of language misuse although the input distribution p(x)
each subsequent paper has no choice consists of taking a term that holds might change, the labeling function
but to confuse its readers, either by precise technical meaning and using p(yx) does not. Moreover, as a result
embracing the term or by replacing it. it in an imprecise or contradictory of the influence of Ioffe and Szegedy,
Describing empirical results with way. Consider the case of deconvolu- Google Scholar lists batch normaliza-
loose claims of “human-level” perfor- tion, which formally describes the tion as the first reference on searches
mance can also portray a false sense process of reversing a convolution, for “covariate shift.”
of current capabilities. Take, for ex- but is now used in the deep-learning Among the consequences of misus-
ample, the “dermatologist-level clas- literature to refer to transpose convo- ing language is the possibility (as with
sification of skin cancer” reported lutions (also called upconvolutions) as generative models) of concealing lack
in a 2017 paper by Esteva et al.12 The commonly found in auto-encoders of progress by redefining an unsolved
comparison with dermatologists con- and generative adversarial networks. task to refer to something easier. This
cealed the fact that classifiers and der- This term first took root in deep often combines with suggestive defi-
matologists perform fundamentally learning in a paper that does address nitions via anthropomorphic naming.
different tasks. Real dermatologists deconvolution but was later overgen- Language understanding and reading
encounter a wide variety of circum- eralized to refer to any neural archi- comprehension, once grand challenges
stances and must perform their jobs tecture using upconvolutions. Such of AI, now refer to making accurate
despite unpredictable changes. The overloading of terminology can create predictions on specific datasets.
machine classifier, however, achiev- lasting confusion. New ML papers re- Suitcase words. Finally, ML pa-
eed low error only on independent, ferring to deconvolution might be in- pers tend to overuse suitcase words.
identically distributed (IID) test data. voking its original meaning, describ- Coined by Marvin Minsky in the 2007
In contrast, claims of human-level ing upconvolution, or attempting to book The Emotion Machine,29 suitcase
performance in work by He et al.16 are resolve the confusion, as in a paper by words pack together a variety of mean-
better qualified to refer to the Ima- Hazirbas, Leal-Taixé, and Cremers,15 ings. Minsky described mental pro-
geNet classification task (rather than which awkwardly refers to “upconvo- cesses such as consciousness, think-
object recognition more broadly). lution (deconvolution).” ing, attention, emotion, and feeling
Even in this case, one careful paper As another example, generative that may not share “a single cause or
(among many less careful) was insuffi- models are traditionally models of origin.” Many terms in ML fall into
cient to put the public discourse back either the input distribution p(x) or this category. For example, I (Lipton)
on track. Popular articles continue to the joint distribution p(x,y). In con- noted in a 2016 paper that interpret-
characterize modern image classifiers trast, discriminative models address ability holds no universally agreed-
as “surpassing human abilities and ef- the conditional distribution p(yx) of upon meaning and often references
fectively proving that bigger data leads the label given the inputs. In recent disjoint methods and desiderata.22 As
JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 49
practice
a consequence, even papers that ap- censed to insert arbitrary unsupported in a submitted paper.
pear to be in dialogue with each other stories (see “Explanation vs. Specula- Misaligned incentives. Reviewers are
may have different concepts in mind. tion”) regarding the factors driving the not alone in providing poor incentives
As another example, generaliza- results; to omit experiments aimed at for authors. As ML research garners
tion has both a specific technical disentangling those factors (see “Fail- increased media attention and ML
meaning (generalizing from train- ure to Identify the Sources of Empiri- startups become commonplace, to
ing to testing) and a more colloquial cal Gains”); to adopt exaggerated ter- some degree incentives are provided
meaning that is closer to the notion minology (see “Misuse of Language”); by the press (“What will they write
of transfer (generalizing from one or to take less care to avoid mathiness about?”) and by investors (“What will
population to another) or of exter- (see “Mathiness”). they invest in?”). The media provides
nal validity (generalizing from an ex- At the same time, the single-round incentives for some of these trends.
perimental setting to the real world). nature of the reviewing process may Anthropomorphic descriptions
Conflating these notions leads to cause reviewers to feel they have no of ML algorithms provide fodder for
overestimating the capabilities of choice but to accept papers with popular coverage. Take, for example, a
current systems. strong quantitative findings. Indeed, 2014 article by Cade Metz in Wired,28
Suggestive definitions and over- even if the paper is rejected, there is that characterized an autoencoder as
loaded terminology can contribute to no guarantee the flaws will be fixed or a “simulated brain.” Hints of human-
the creation of new suitcase words. even noticed in the next cycle, so re- level performance tend to be sensa-
In the fairness literature, where le- viewers may conclude that accepting a tionalized in newspaper coverage—
gal, philosophical, and statistical flawed paper is the best option. for example, an article in the New York
language are often overloaded, terms Growing pains. Since around 2012, Times by John Markoff described a
such as bias become suitcase words the ML community has expanded rap- deep-learning image-captioning sys-
that must be subsequently unpacked. idly because of increased popularity tem as “mimicking human levels of
In common speech and as aspira- stemming from the success of deep- understanding.”25
tional terms, suitcase words can serve learning methods. While the rapid Investors, too, have shown a strong
a useful purpose. Sometimes a suit- expansion of the community can be appetite for AI research, funding start-
case word might reflect an overarch- seen as a positive development, it can ups sometimes on the basis of a sin-
ing aspiration that unites the vari- also have side effects. gle paper. In my (Lipton) experience
ous meanings. For example, artificial To protect junior authors, we have working with investors, they are some-
intelligence might be well suited as preferentially referenced our own times attracted to startups whose re-
an aspirational name to organize an papers and those of established re- search has received media coverage,
academic department. On the other searchers. And certainly, experienced a dynamic that attaches financial in-
hand, using suitcase words in techni- researchers exhibit these patterns. centives to media attention. Note that
cal arguments can lead to confusion. Newer researchers, however, may be recent interest in chatbot startups
For example, in his 2017 book, Super- even more susceptible. For example, co-occurred with anthropomorphic
intelligence,3 Nick Bostrom wrote an authors unaware of previous termi- descriptions of dialogue systems and
equation (Box 4) involving the terms nology are more likely to misuse or re- reinforcement learners both in papers
intelligence and optimization power, define language (as discussed earlier). and in the media, although it may be
implicitly assuming these suitcase Rapid growth can also thin the re- difficult to determine whether the
words can be quantified with a one- viewer pool in two ways: by increas- lapses in scholarship caused the inter-
dimensional scalar. ing the ratio of submitted papers est of investors or vice versa.
to reviewers and by decreasing the Suggestions. Suppose we are to inter-
Speculation on Causes fraction of experienced reviewers. vene to counter these trends, then how?
Behind the Trends Less-experienced reviewers may Besides merely suggesting that each au-
Do the patterns mentioned here rep- be more likely to demand architec- thor abstain from these patterns, what
resent a trend, and if so, what are the tural novelty, be fooled by spurious can we do as a community to raise the
underlying causes? We speculate that theorems, and let pass serious but level of experimental practice, exposi-
these patterns are on the rise and sus- subtle issues such as misuse of tion, and theory? And how can we more
pect several possible causal factors: language, thus either incentivizing readily distill the knowledge of the com-
complacency in the face of progress, or enabling several of the trends munity and disabuse researchers and
the rapid expansion of the commu- described here. At the same time, the wider public of misconceptions?
nity, the consequent thinness of the experienced but overburdened re- What follows are a number of prelimi-
reviewer pool, and misaligned incen- viewers may revert to a “checklist” nary suggestions based on personal ex-
tives of scholarship vs. short-term mentality, rewarding more formu- periences and impressions.
measures of success. laic papers at the expense of more
Complacency in the face of progress. creative or intellectually ambitious For Authors, Publishers,
The apparent rapid progress in ML has work that might not fit a preconceived and Reviewers
at times engendered an attitude that template. Moreover, overworked re- We encourage authors to ask “What
strong results excuse weak arguments. viewers may not have enough time to worked?” and “Why?” rather than just
Authors with strong results may feel li- fix—or even to notice—all of the issues “How well?” Except in extraordinary
cases, raw headline numbers provide paper if the authors had done a worse
limited value for scientific progress job?” For example, a paper describing a
absent insight into what drives them. simple idea that leads to improved per-
Insight does not necessarily mean formance, together with two negative
theory. Three practices that are com-
mon in the strongest empirical papers Investors have results, should be judged more favor-
ably than a paper that combines three
are error analysis, ablation studies,
and robustness checks (for example,
shown a strong ideas together (without ablation stud-
ies) yielding the same improvement.
choice of hyperparameters, as well as appetite for AI Current literature moves fast at the
ideally the choice of dataset). Every-
one can adopt these practices, and
research, funding expense of accepting flawed works for
conference publication. One remedy
we advocate their widespread use. startups sometimes could be to emphasize authoritative
For some exemplar papers, consider
the preceding discussion in “Failure
on the basis of a retrospective surveys that strip out
exaggerated claims and extraneous
to Identify the Sources of Empirical single paper. material, change anthropomorphic
Gains.” Langley and Kibler21 also pro- names to sober alternatives, standard-
vide a more detailed survey of empiri- ize notation, and so on. While venues
cal best practices. such as Foundations and Trends in Ma-
Sound empirical inquiry need not chine Learning, a journal from Now
be confined to tracing the sources of a Publishers in Hanover, MA, already
particular algorithm’s empirical gains; provide a track for such work, there
it can yield new insights even when no are still not enough strong papers in
new algorithm is proposed. Notable this genre.
examples of this include a demonstra- Additionally, we believe (noting our
tion that neural networks trained by conflict of interest) that critical writ-
stochastic gradient descent can fit ran- ing ought to have a voice at ML confer-
domly assigned labels.46 This paper ences. Typical ML conference papers
questions the ability of learning-the- choose an established problem (or
oretic notions of model complexity to propose a new one), demonstrate an
explain why neural networks can gen- algorithm and/or analysis, and report
eralize to unseen data. In another ex- experimental results. While many
ample, Goodfellow, Vinyals, and Saxe14 questions can be approached in this
explored the loss surfaces of deep net- way, when addressing the validity of
works, revealing that straight-line paths the problems or the methods of in-
in parameter space between initialized quiry themselves, neither algorithms
and learned parameters typically have nor experiments are sufficient (or ap-
monotonically decreasing loss. propriate). We would not be alone in
When researchers are writing their embracing greater critical discourse:
papers, we recommend they ask the in natural language processing (NLP),
following question: Would I rely on this this year’s Conference on Computa-
explanation for making predictions or for tional Linguistics (COLING) included
getting a system to work? This can be a a call for position papers “to challenge
good test of whether a theorem is being conventional thinking.”
included to please reviewers or to con- There are many lines of further
vey actual insight. It also helps check discussion worth pursuing regard-
whether concepts and explanations ing peer review. Are the problems de-
match the researcher’s own internal scribed here mitigated or exacerbated
mental model. On mathematical writ- by open review? How do reviewer point
ing, we point the reader to Knuth, Lar- systems align with the values that we
rabee, and Roberts’s excellent guide- advocate? These topics warrant their
book, Mathematical Writing.20 own papers and have indeed been dis-
Finally, being clear about which cussed at length elsewhere.
problems are open and which are Discussion. Folk wisdom might
solved not only presents a clearer pic- suggest not to intervene just as the
ture to readers, but also encourages field is heating up—you can’t argue
follow-up work and guards against with success! We counter these objec-
researchers neglecting questions pre- tions with the following arguments:
sumed (falsely) to be resolved. First, many aspects of the current cul-
Reviewers can set better incentives ture are consequences of ML’s recent
by asking: “Might I have accepted this success, not its causes. In fact, many of
JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 51
practice
the papers leading to the current suc- and the experiments so computation-
cess of deep learning were careful em- ally expensive to run that waiting for
pirical investigations characterizing ablations to complete might not have
principles for training deep networks. been worth the cost to the community.
This includes the advantage of ran-
dom over sequential hyperparameter Greater rigor in A related concern is that high stan-
dards might impede the publication of
search, the behavior of different acti-
vation functions, and an understand-
exposition, science, original ideas, which are more likely to be
unusual and speculative. In other fields,
ing of unsupervised pretraining. and theory are such as economics, high standards re-
Second, flawed scholarship already
negatively impacts the research com-
essential for both sult in a publishing process that can take
years for a single paper, with lengthy re-
munity and broader public discourse. scientific progress vision cycles consuming resources that
The “Troubling Trends” section of this
article gives examples of unsupported
and fostering could be deployed toward new work.
Finally, perhaps there is value
claims being cited thousands of times, productive in specialization: The researchers
lineages of purported improvements
being overturned by simple baselines, discourse with generating new conceptual ideas or
building new systems need not be the
datasets that appear to test high-level the broader public. same ones who carefully collate and
semantic reasoning but actually test distill knowledge.
low-level syntactic fluency, and termi- These are valid considerations, and
nology confusion that muddles the ac- the standards we are putting forth
ademic dialogue. This final issue also here are at times exacting. In many
affects public discourse. For example, cases, however, they are straightfor-
the European Parliament passed a re- ward to implement, requiring only
port considering regulations to apply a few extra days of experiments and
if “robots become or are made self- more careful writing. Moreover, they
aware.”10 While ML researchers are are being presented as strong heuris-
not responsible for all misrepresenta- tics rather than unbreakable rules—
tions of our work, it seems likely that if an idea cannot be shared without
anthropomorphic language in author- violating these heuristics, the idea
itative peer-reviewed papers is at least should be shared and the heuristics
partly to blame. set aside.
Greater rigor in exposition, sci- We have almost always found at-
ence, and theory are essential for both tempts to adhere to these standards to
scientific progress and fostering pro- be well worth the effort. In short, the
ductive discourse with the broader research community has not achieved
public. Moreover, as practitioners a Pareto optimal state on the growth-
apply ML in critical domains such as quality frontier.
health, law, and autonomous driving,
a calibrated awareness of the abilities Historical Antecedents
and limits of ML systems will help us The issues discussed here are unique
to deploy ML responsibly. neither to machine learning nor to this
moment in time; they instead reflect
Countervailing Considerations issues that recur cyclically through-
There are a number of countervailing out academia. As far back as 1964, the
considerations to the suggestions set physicist John R. Platt34 discussed relat-
forth in this article. Several readers of ed concerns in his paper on strong in-
earlier drafts of this paper noted that ference, where he identified adherence
stochastic gradient descent tends to con- to specific empirical standards as re-
verge faster than gradient descent—in sponsible for the rapid progress of mo-
other words, perhaps a faster, noisier lecular biology and high-energy physics
process that ignores our guidelines for relative to other areas of science.
producing “cleaner” papers results in There have been similar discus-
a faster pace of research. For example, sions in AI. As noted in the introduc-
the breakthrough paper on ImageNet tion to this article, McDermott26 criti-
classification proposes multiple tech- cized a (mostly pre-ML) AI community
niques without ablation studies, sev- in 1976 on a number of issues, includ-
eral of which were subsequently deter- ing suggestive definitions and a fail-
mined to be unnecessary. At the time, ure to separate out speculation from
however, the results were so significant technical claims. In 1988, Cohen and
Howe8 addressed an AI community 2. Bengio, Y. Practical recommendations for gradient- artificial brain. Wired (Sept. 26, 2014); https://www.
based training of deep architectures. Neural Networks: wired.com/2014/09/google-artificial-brain/.
that at that point “rarely publish[ed] Tricks of the Trade. G. Montavon, G.B. Orr, KR Müller, 29. Minsky, M. The Emotion Machine: Commonsense
performance evaluations” of their eds. LNCS 7700 (2012). Springer, Berlin, Heidelberg, Thinking, Artificial Intelligence, and the Future of the
437–78. Human Mind. Simon & Schuster, New York, NY, 2006.
proposed algorithms and instead 3. Bostrom, N. Superintelligence. Dunod, Paris, France, 2017. 30. Mohamed, S., Lakshminarayanan, B. Learning in
only described the systems. They sug- 4. Bottou, L. et al. Counterfactual reasoning and learning implicit generative models. arXiv Preprint, 2016;
systems: The example of computational advertising. J. arXiv:1610.03483.
gested establishing sensible metrics Machine Learning Research 14, 1 (2013), 3207–3260. 31. Noh, H., Hong, S. and Han, B. Learning deconvolution
for quantifying progress, and analyz- 5. Bray, A.J. and Dean, D.S. Statistics of critical points of network for semantic segmentation. In Proceedings of
Gaussian fields on large-dimensional spaces. Physical the Intern. Conf. Computer Vision, 2015, 1520–1528.
ing the following: “Why does it work?” Review Letters 98, 15 (2007), 150201; https://journals. 32. Nye, M.J. N-rays: An episode in the history and
“Under what circumstances won’t it aps.org/prl/abstract/10.1103/PhysRevLett.98.150201. psychology of science. Historical Studies in the
6. Chen, D., Bolton, J. and Manning, C.D. A thorough Physical Sciences 11, 1 (1980), 125–56.
work?” and “Have the design deci- examination of the CNN/Daily Mail reading 33. Open Science Collaboration. Estimating the
comprehension task. In Proceedings of the 54th reproducibility of psychological science. Science 349,
sions been justified?”—questions that 6251 (2015), aac4716.
Annual Meeting of Assoc. Computational Linguistics,
continue to resonate today. 2016, 2358–2367. 34. Platt, J.R. Strong inference. Science 146, 3642 (1964),
7. Choromanska, A., Henaff, M., Mathieu, M., Arous, G.B., 347–353.
Finally, in 2009 Armstrong et al.1 LeCun, Y. The loss surfaces of multilayer networks. 35. Reddi, S.J., Kale, S. and Kumar, S. On the convergence
discussed the empirical rigor of in- In Proceedings of the 18th Intern. Conf. Artificial of Adam and beyond. In Proceedings of the Intern.
Intelligence and Statistics, 2015. Conf. Learning Representations, 2018.
formation-retrieval research, noting a 8. Cohen, P.R., Howe, A.E. How evaluation guides AI 36. Romer, P.M. Mathiness in the theory of economic
tendency of papers to compare against research: the message still counts more than the growth. Amer. Econ. Rev. 105, 5 (2015), 89–93.
medium. AI Magazine 9, 4 (1988), 35. 37. Santurkar, S., Tsipras, D., Ilyas, A. and Madry, A. How
the same weak baselines, producing a 9. Cotterell, R., Mielke, S.J., Eisner, J. and Roark, B. Are does batch normalization help optimization? (No, it is
long series of improvements that did all languages equally hard to language-model? In not about internal covariate shift). In Proceedings of
Proceedings of Conf. North American Chapt. Assoc. the 32nd Conf. Neural Information Processing Systems;
not accumulate to meaningful gains. Computational Linguistics: Human Language T 2018; https://papers.nips.cc/paper/7515-how-does-
In other fields, an unchecked de- echnologies, Vol. 2, 2018. batch-normalization-help-optimization.pdf.
10. Council of the European Union. Motion for a European 38. Sculley, D., Snoek, J., Wiltschko, A. and Rahimi, A.
cline in scholarship has led to crisis. Parliament Resolution with Recommendations to the Winner’s curse? On pace, progress, and empirical
A landmark study in 2015 suggested a Commission on Civil Law Rules on Robotics, 2016; rigor. In Proceedings of the 6th Intern. Conf. Learning
https://bit.ly/285CBjM. Representations, Workshop Track, 2018
significant portion of findings in the 11. Dauphin, Y.N. et al. Identifying and attacking the 39. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever,
psychology literature may not be re- saddle point problem in high-dimensional non- I. and Salakhutdinov, R. Dropout: A simple way to
convex optimization. Advances in Neural Information prevent neural networks from overfitting. J. Machine
producible.33 In a few historical cases, Processing Systems, 2014, 2933–2941. Learning Research 15, 1 (2014), 1929–1958; https://
dl.acm.org/citation.cfm?id=2670313.
enthusiasm paired with undisciplined 12. Esteva, A. et al. Dermatologist-level classification of
40. Steinhardt, J. and Liang, P. Learning fast-mixing
skin cancer with deep neural networks. Nature 542,
scholarship led entire communities 7639 (2017), 115-118. models for structured prediction. In Proceedings of
the 32nd Intern. Conf. Machine Learning 37 (2015),
down blind alleys. For example, fol- 13. Gershgorn, D. The data that transformed AI
1063–1072; http://proceedings.mlr.press/v37/
research—and possibly the world. Quartz, 2017;
lowing the discovery of X-rays, a re- https://bit.ly/2uwyb8R. steinhardtb15.html.
14. Goodfellow, I.J., Vinyals, O. and Saxe, A.M. 41. Steinhardt, J. and Liang, P. Reified context models. In
lated discipline on N-rays emerged Proceedings of the 32nd Intern. Conf. Machine Learning
Qualitatively characterizing neural network
before it was eventually debunked.32 optimization problems. In Proceedings of the Intern. 37, (2015), 1043–1052; https://dl.acm.org/citation.
Conf. Learning Representations, 2015. cfm?id=3045230.
15. Hazirbas, C., Leal-Taixé, L. and Cremers, D. Deep depth 42. Steinhardt, J., Koh, P.W. and Liang, P.S. Certified
Concluding Remarks from focus. arXiv Preprint, 2017; arXiv:1704.01085. defenses for data poisoning attacks. In Proceedings of
16. He, K., Zhang, X., Ren, S. and Sun, J. Delving deep into the 31st Conf. Neural Information Processing Systems,
The reader might rightly suggest these rectifiers: Surpassing human-level performance on 2017; https://papers.nips.cc/paper/6943-certified-
problems are self-correcting. We ImageNet classification. In Proceedings of the IEEE defenses-for-data-poisoning-attacks.pdf.
Intern. Conf. Computer Vision, 2015, 1026–1034. 43. Stock, P. and Cisse, M. ConvNets and ImageNet
agree. However, the community self- 17. Henderson, P. et al. Deep reinforcement learning beyond accuracy: Explanations, bias detection,
corrects precisely through recurring that matters. In Proceedings of the 32nd Assoc. adversarial examples and model criticism. arXiv
Advancement of Artificial Intelligence Conf., 2018. Preprint, 2017, arXiv:1711.11443.
debate about what constitutes reason- 18. Ioffe, S. and Szegedy, C. Batch normalization: 44. Szegedy, C. et al. Intriguing properties of neural
able standards for scholarship. We accelerating deep network training by reducing networks. Intern. Conf. Learning Representations.
internal covariate shift. In Proceedings of the 32nd arXiv Preprint, 2013, arXiv:1312.6199.
hope that this paper contributes con- Intern. Conf. Machine Learning 37, 2015; http:// 45. Zellers, R., Yatskar, M., Thomson, S. and Choi, Y. Neural
motifs: Scene graph parsing with global context. In
structively to the discussion. proceedings.mlr.press/v37/ioffe15.pdf.
Proceedings of the IEEE Conf. Computer Vision and
19. Kingma, D.P. and Ba, J. Adam: A method for stochastic
optimization. In Proceedings of the 3rd Intern. Conf. Pattern Recognition, 2018, 5831–5840.
46. Zhang, C., Bengio, S., Hardt, M., Recht, B. and Vinyals,
Acknowledgments Learning Representations, 2015
O. Understanding deep learning requires rethinking
20. Knuth, D.E., Larrabee, T. and Roberts, P.M.
We thank Asya Bergal, Kyunghyun Mathematical writing, 1987; https://bit.ly/2TmxyNq generalization. In Proceedings of the Intern. Conf.
Learning Representations, 2017.
Cho, Moustapha Cisse, Daniel Dewey, 21. Langley, P. and Kibler, D. The experimental study of
machine learning, 1991; http://www.isle.org/~langley/
Danny Hernandez, Charles Elkan, Ian papers/mlexp.ps.
Zachary C. Lipton is an assistant professor at Carnegie
Goodfellow, Moritz Hardt, Tatsunori 22. Lipton, Z.C. The mythos of model interpretability.
Mellon University in the Tepper School of Business with
Intern. Conf. Machine Learning Workshop on Human
Hashimoto, Sergey Ioffe, Sham Ka- Interpretability, 2016.
appointments in the Machine Learning Department and
the Heinz School of Public Policy. He also collaborates
kade, David Kale, Holden Karnofsky, 23. Lipton, Z.C., Chouldechova, A. and McAuley, J. Does
with Amazon, where he helped to grow AWS’ Amazon
mitigating ML’s impact disparity require treatment
Pang Wei Koh, Lisha Li, Percy Liang, disparity? Advances in Neural Inform. Process. Syst.
AI team and contributed to the Apache MXNet deep
learning framework. Find him at zacklipton.com,
Julian McAuley, Robert Nishihara, 2017, 8136-8146. arXiv Preprint arXiv:1711.07076.
Twitter @zacharylipton, or GitHub @zackchase.
24. Lucic, M., Kurach, K., Michalski, M., Gelly, S., Bousquet,
Noah Smith, Balakrishnan “Murali” O. Are GANs created equal? A large-scale study. In Jacob Steinhardt will be joining UC Berkeley as an
Narayanaswamy, Ali Rahimi, Christo- Proceedings of the 32nd Conf. Neural Information assistant professor of statistics. He is a technical advisor
Processing Syst. arXiv Preprint 2017; arXiv:1711.10337. for the Open Philanthropy Project and has collaborated
pher Ré, and Byron Wallace. We also 25. Markoff, J. Researchers announce advance in image- with policy researchers to understand and avoid potential
thank the ICML Debates organizers. recognition software. NYT (Nov. 17, 2014); https://nyti. misuses of machine learning.
ms/2HfcmSe.
26. McDermott, D. Artificial intelligence meets natural
References stupidity. ACM SIGART Bulletin 57 (1976), 4–9.
1. Armstrong, T.G., Moffat, A., Webber, W. and Zobel, J. 27. Melis, G., Dyer, C. and Blunsom, P. On the state of
Improvements that don’t add up: ad-hoc retrieval the art of evaluation in neural language models.
results since 1998. In Proceedings of the 18th ACM In Proceedings of the Intern. Conf. Learning
Conf. Information and Knowledge Management, 2009, Representations, 2018. Copyright held by owners/authors.
601–610. 28. Metz, C. You don’t have to be Google to build an Publication rights licensed to ACM.
JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 53
contributed articles
DOI:10.1145/ 3286588
months and application-demanded
Programmable software-defined requirements from the storage sys-
tem grow quickly over time. This
solid-state drives can move computing notable lag in the adaptability and
functions closer to storage. velocity of movement of the storage
infrastructure may ultimately affect
BY JAEYOUNG DO, SUDIPTA SENGUPTA, AND STEVEN SWANSON the ability to innovate throughout the
cloud world.
Programmable
In this article, we advocate creating
a software-defined storage substrate of
solid-state drives (SSDs) that are as pro-
Solid-State
grammable, agile, and flexible as the
applications/OS accessing from serv-
ers in cloud datacenters. A fully pro-
grammable storage substrate prom-
Storage in
ises opportunities to better bridge the
gap between application/OS needs and
storage capabilities/limitations, while
Future Cloud
allowing application developers to in-
novate in-house at cloud speed.
The move toward software-defined
Datacenters
control for IO devices and co-proces-
sors has played out before in the data-
center. Both GPUs and network inter-
face cards (NICs) started as black-box
devices that provide acceleration for
CPU-intensive operations (such as
graphics and packet processing). In-
ternally, they implemented accelera-
tion features with a combination of
specialized hardware and proprietary
THERE IS A major disconnect today in cloud datacenters firmware. As customers demanded
concerning the speed of innovation between greater flexibility, vendors slowly ex-
posed programmability to the rest of
application/operating system (OS) and storage the system, unleashing the vast pro-
infrastructures. Application/OS software is patched cessing power available from GPUs
with new/improved functionality every few weeks at and a new level of agility in how sys-
tems can manage networks for en-
“cloud speed,” while storage devices are off-limits hanced functionality like more granu-
for such sustained innovation during their hardware lar traffic management, security, and
life cycle of three to five years in datacenters. Since
key insights
the software inside the storage device is written by
˽˽ A fully programmable storage substrate
storage vendors as proprietary firmware not open in cloud datacenters opens up new
opportunities to innovate the storage
for general application developers to modify, the infrastructure at cloud speed.
developers are stuck with a device whose functionality ˽˽ In-storage programming is becoming
increasingly easier with powerful
and capabilities are frozen in time, even as many of processing capabilities and highly flexible
them are modifiable in software. A period of five years is development environments.
almost eternal in the cloud computing industry where ˽˽ New value propositions with the
programmable storage substrate can be
new features, platforms, and application program realized, such as customizing the storage
interface, moving compute close to data,
interfaces (APIs) are evolving every couple of and performing secure computations.
easily accessible interfaces will let growing interest in the aggressive use updates are allowed—and memory
storage systems in the cloud data- of SSDs that, compared with tradition- can endure only a limited number
centers adapt to rapidly changing re- al spinning hard disk drives (HDDs), of writes before it can no longer be
quirements on the fly. provides orders-of-magnitude lower read. Therefore, the controller must
latency and higher throughput. In ad- be able to perform some background
Storage Trends dition to these performance benefits, management tasks (such as garbage
The amount of data being generated the advent of new technologies (such collection) to reclaim flash blocks
daily is growing exponentially, placing as 3D NAND enabling much denser containing invalid data to create
more and more processing demand on chips and quad-level-cell, or QLC, for available space and wear leveling to
datacenters. According to a 2017 mar- bulk storage) allows SSDs to continue evenly distribute writes across the
keting-trend report from IBM,a 90% of to significantly scale in capacity and to entire flash blocks with the purpose
the data in the world in 2016 has been yield a huge reduction in price. of extending the SSD life. These tasks
created in the last 12 months of 2015. There are two key components in are, in general, implemented by pro-
SSDs,4 as shown in Figure 1—an SSD prietary firmware running on one or
a https://ibm.co/2XNvHPk controller and flash storage media. more embedded processor cores in
JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 55
contributed articles
Processor Controller
Controller
In-Storage Programming
Modern SSDs combine processing—
embedded processor—and storage
Figure 2. Example conventional storage server architecture with multiple NVMe SSDs. components—SRAM, DRAM, and flash
memory—to carry out routine func-
SSD Storage System
tions required for managing the SSD.
These computing resources present in-
0
Flash SSD 0
teresting opportunities to run general
0
PCIe 1
user-defined programs. In 2013, Do et
Switch Flash SSD
1 al.6,17 explored such opportunities for
2
Flash SSD the first time in the context of running
1
PCIe 3
Flash SSD 31
selected database operations inside
Switch
a Samsung SAS flash SSD. They wrote
Root complex
DRAM
CPU
0
Flash SSD 0 ators that were compiled into the SSD
15
1
Flash SSD 1
firmware and extended the execution
PCIe 2 framework of Microsoft SQL Server
Switch Flash SSD
3
2012 to develop a working prototype
Flash SSD 31
in which simple selection and aggrega-
(a) (b) (c) (d)
tion queries could be run end-to-end.
Throughput gap of 8x
64 SSDs X ~2 GB/s
= ~128 GB/s
That work demonstrated several
16 lanes of PCIe times improvement in performance
= ~16 GB/s
ular debugging tools (such as Micro- available inside the SSD are increasingly software-hardware innovation inside
soft Visual Studio) available to general powerful, with abundant compute the SSD. Moreover, going beyond the
application developers. Worse, the de- and bandwidth resources. Emerging packaged SSD, because the two major
vice-side processing code—selection SSDs include software-programmable components inside the SSD are each
and aggregation—had to be compiled controllers with multi-core proces- manufactured by multiple vendors,d it
into the SSD firmware in the prototype, sors, built-in hardware accelerators is conceivable that SSDs could be cus-
meaning application developers would to offload compute-intensive tasks tom designed and provided in partner-
need to worry about not only the target from the processors, multiple GBs of ship with component vendorse (just
application itself but also complex in- DRAM, and tens of independent chan- like how today’s datacenter servers are
ternal structures and algorithms in the nels to the underlying storage media, built and deployed), and even contrib-
SSD firmware. allowing several GB/s of internal data ute back some of the designs to the
On top of this, the consequences throughput. Even more interesting community (via forums like the Open
of an error can be quite severe, which and useful, programming SSDs is be- Compute project, https://www.open-
could result in corrupted data or an coming easier, with the trend away compute.org). For example, the indus-
unusable drive. Workaday application from proprietary architectures and try is already moving in this direction
programmers are unlikely to accept software runtimes and toward com- with introduction of the Open-Channel
the additional complexity, and cloud modity operating systems (such as SSD technology2,8,f that moves much of
providers are unlikely to let untrusted Linux) running on top of general- the SSD firmware functionalities out of
code run in such a fragile environment. purpose processors (such as ARM and the black box and into the operating
Application developers need a flex- RISC-V). This trend enables general system or userspace, giving applica-
ible and general programming model application developers to fully lever- tions better control over the device. In
that allows easily running user code age existing tools, libraries, and exper- an open source project called Denalig
written in a high-level programming tise, allowing them to focus on their in 2018, Microsoft proposed a scheme
language (such as C/C++) inside an own core competencies rather than
SSD. The programming model must spending many hours getting used to d Several vendors manufacture each type of
also support the concurrent execution the low-level, embedded development component in flash SSDs. For example: flash
of multiple in-SSD applications while process. This also allows application controller manufactured by Marvell, PMC (ac-
ensuring that malicious applications developers to easily port large applica- quired by Microsemi), Sandforce (acquired by
Seagate), Indilinx (acquired by OCZ), and flash
do not adversely affect the overall SSD tions already running on host operat- memory manufactured by Samsung, Toshiba,
operation or violate protection guar- ing systems to the device with mini- and Micron.
antees provided by the operating and mal code changes. e Many large-scale datacenter operators (such
file system. All in all, the programmability evo- as Google19 and Baidu16) build their own SSDs
In 2014, Seshadri et al.20 proposed lution in SSDs presents a unique op- that are fully optimized for their own applica-
tion requirements.
Willow, an SSD that made program- portunity to embrace the SSDs as a f The Linux Open-Channel SSD subsystem was
mability a central feature of the SSD first-class programmable platform introduced in the Linux kernel version 4.4.
interface, allowing ordinary developers in the cloud datacenters, enabling g https://bit.ly/2GCuIum
to safely augment and extend the SSD
semantics with application-specific Figure 3. Disruptive trends in the flash storage industry toward abundant resources and
increased ease of programmability inside the SSD.
functions without compromising file
system protections. In their model, host
and in-SSD applications communicate
via PCIe using a simple, generic—not
offload, DRAM, # flash channels and capacity)
Abundant
storage-centric—remote procedure call resources Programmable
(CPU #cores/clock speed, hardware
JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 57
contributed articles
that splits the monolithic components ray (FPGA)13,21 and GPU,h with storage GPU ecosystems have in the past. This
of an SSD into two different modules— media) and flash and other emerging is an opportunity to rethink datacenter
one standardized part dealing with new non-volatile memories (such as 3D architecture with efficient use of het-
storage media and a software interface XPoint, ReRAM, STT-RAM, and PCM) erogeneous, energy-efficient hardware,
to handle application-specific tasks that provide persistent storage at DRAM which is the way forward for higher
(such as garbage collection and wear latencies to deliver high-performance performance at lower power.
leveling). In this way, SSD suppliers can gains. This approach would present the
build simpler products for datacen- greatest flexibility to take advantage of Value Propositions
ters and deliver them to market more advances in the underlying storage de- Here, we summarize three value
quickly while per-application tuning is vice to optimize performance for mul- propositions that demonstrate future
possible by datacenter operators. tiple cloud applications. In the near directions in programmable storage
The component-based ecosystem future, the software-hardware innova- (see Figure 4):
also opens up entirely new opportu- tion inside the SSD can proceed much Agile, flexible storage interface (see
nities for integrating powerful het- like the PC, networking hardware, and Figure 4a). Full programmability will al-
erogeneous programming elements low the storage interface and feature set
(such as field-programmable gate ar- h https://bit.ly/2L8LfM4 to evolve at cloud speed, without having
to persuade standardization bodies to
Figure 4. Programmable SSD value proposition. bless them or persuade device manu-
facturers to implement them in the
next-generation hardware roadmap,
Programmable SSD both usually involving years of delay. A
richer, customizable storage interface
will allow application developers to stay
DRAM focused on their application, without
Host having to work around storage con-
Server Processor + straints, quirks, or peculiarities, thus
HW Offload improving developer productivity.
As an example of the need for such an
interface, consider how stream writes
(a)
Agile, flexible storage interface are handled in the SSD today. Because
leveraging programmability within SSD the SSD cannot differentiate between
incoming data from multiple streams, it
could pack data from different streams
Programmable SSD onto the same flash erase block, the
smallest unit that can be erased from
flash at once. When a portion of the
DRAM
stream data is deleted, it leaves blocks
Host with holes of invalid data. To reclaim
Server these blocks, the garbage-collection ac-
Processor +
HW Offload tivity inside the SSD must copy around
the valid data, slowing the device and
increasing write amplification, thus re-
(b)
Moving compute inside SSD to leverage ducing device lifetime.
low latency, high bandwidth, and access to data If application developers had con-
trol over the software inside the SSD,
they could handle streams much more
Programmable SSD efficiently. For instance, incoming
writes could be tagged with stream
IDs and the device could use this in-
DRAM
formation to fill a block with data
Host from the same stream. When data
Server
Processor + from that stream is deleted, the entire
HW Offload data block could be reclaimed with-
out copying around data. Such stream
awareness has been shown to double
(c) device lifetime, significantly increas-
Trusted domain for secure computation
(cleartext not allowed to egress the SSD boundary.)
ing read performance.14 In Micro-
soft, this need of supporting multiple
streams in the SSD was identified in
2014, but NVMe incorporated the fea-
ture only late 2017.i Moreover, large- analytic query is given, compressed to provide access to these files.10 Secu-
scale deployment in Microsoft data- data required to answer the query is rity is often among the topmost con-
centers might take at least another first loaded to host, uncompressed, cerns enterprise chief information offi-
year and be very expensive, since new and then executed using host resourc- cers have when they move to the cloud,
SSDs must be purchased to essential- es. Such fundamental data analytics as cloud providers are unwilling to take
ly get a new version of the firmware. primitive can be processed inside the on full liability for the impact of such
Waiting five years for a change to a SSD by accessing data with high in- breaches. Development of a secure
system software component is com- ternal bandwidth and by offloading cloud is not just a feature requirement
pletely out of step with how quickly decompression to the dedicated en- but also an absolute foundational ca-
computer systems are evolving today. gine. Subsequent stages of the query- pability necessary for the future of the
A programmable storage platform processing pipeline (such as filtering cloud computing model and its busi-
would reduce this delay to months out unnecessary data and performing ness success as an industry.
and allow rapid iteration and refine- the aggregation) can execute inside To realize the vision of a trusted
ment of the feature, not to mention the SSD, resulting in greatly reduced cloud, data must be encrypted while
the ability to “tweak” the implementa- network traffic and saved host CPU/ stored at rest, which however, limits
tion to match specific use cases. memory resources for other important the kind of computation that can be
Moving compute close to data (see jobs. Further, performance and band- performed on encrypted data with-
Figure 4b). The need to analyze and width together can be scaled by adding out decryption. To facilitate arbitrary
glean intelligence from big data im- more SSDs to the system if the applica- (legitimate) computation on stored
poses a shift from the traditional com- tion requires higher data rates. data, it needs to be decrypted before
pute-centric model to data-centric Secure computation in the cloud computing on it. This requires de-
model. In many big data scenarios, (see Figure 4c). Recent security breach crypted cleartext data to be present
application performance and re- events related to personal, private in- (at least temporarily) in various por-
sponsiveness (demanded by interac- formation (financial and otherwise) tions of the datacenter infrastructure
tive usage) is dominated not by the have exposed the vulnerability of data vulnerable to security attacks. Appli-
execution of arithmetic and logic in- infrastructures to hackers and attackers. cation developers need a way to facili-
structions but instead by the require- Also, a new type of malicious software tate secure computation on the cloud
ment to handle huge volumes of data called “encryption ransomware” at- by fencing in well-defined, narrow,
and the cost of moving this data to the tacks machines by stealthily encrypt- trusted domains that can preserve
location(s) where compute is per- ing data files and demanding a ransom the ability to perform arbitrary com-
formed. When this is the case, moving
the compute closer to the data can Figure 5. A prototype programmable SSD developed for research purposes.
reap huge benefits in terms of in-
creased throughput, lower latency,
and reduced energy usage.
Big data analytics running inside an
SSD can have access to the stored data
with tens of GB/sec bandwidth (rivaling
DRAM bandwidth), and with latency
comparable to accessing raw non-vola-
tile memory. In addition, large energy
savings can be achieved because pro-
cessors inside the SSD are more energy
efficient compared to the host-server
CPU (such as Intel Xeon), and data (a)
Device with a storage board with an embedded storage controller
does not need to be hauled over large and DIMM slots for flash or other forms of NVM
distances from storage all the way up
to the host via network, which is more
energy-expensive than processing it.
Processors inside the SSD are clear-
ly not as powerful as host processors,
but together with in-storage hardware
offload engines, a broad range of data
processing tasks can be competitively
performed inside the SSD. As an ex-
ample, consider how data analytic que-
ries are processed in general: When an
(b)
Device with a storage board where M.2 SSDs can be plugged into.
i Note the multi-stream technology for SCSI/
SAS was standardized in T10 on May 20, 2015.
JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 59
contributed articles
putation on the data. flexible with enterprise-level capa- toward bringing compute to SSDs so
SSDs with their powerful compute bilities and resources. It comprises data can be processed without leaving
capabilities can form a trusted do- a main board and a storage board. the place where it is originally stored.
main for doing secure computation The main board contains an ARMv8 For instance, in 2017 NGD Systemsl
on encrypted data, leveraging their in- processor, 16GB of RAM, and various announced an SSD called Catalina21
ternal hardware cryptographic engine on-chip hardware accelerators (such capable of running applications di-
and secure boot mechanisms for this as 20Gbps compression/decompres- rectly on the device. Catalina2 uses
purpose. Cryptographic keys can be sion, 20Gbps SEC-crypto, and 10Gbps TLC 3D NAND flash (up to 24TB),
stored inside the SSD, allowing arbi- RegEx engines). It also provides which is connected to the onboard
trary compute to be carried out on the NVMe connectivity via four PCIe Gen3 ARM SoC that runs an embedded
stored data—after decryption if need- lanes, and 4x10Gbps Ethernet that Linux and modules for error-correct-
ed—while enforcing that data cannot supports remote dynamic memory ing code (ECC) and FTL. On the host
leave the device in cleartext form. This access (RDMA) over converged ether- server, a tunnel agent (with C/C++
allows a new, flexible, easily program- net (RoCE) protocol. It supports two libraries) runs to talk to the device
mable, near-data realization of trusted different storage boards that connect through the NVMe protocol. As anoth-
hardware in the cloud. Compared to via 2x4 PCIe Gen3 lanes: One type of er example, ScaleFluxm uses a Xilinx
currently proposed solutions like Intel board (see Figure 5a) includes an em- FPGA (combined with terabytes of
Enclavesj that are protected, isolated bedded storage controller and four TLC 3D NAND flash) to compute data
areas of execution in the host server memory slots where flash or other for data-intensive applications. The
memory, this solution protects orders forms of NVM can be installed; and host server runs a software module,
of magnitude more data. the second (see Figure 5b) an adapter providing API accesses to the device
that hosts two M.2 SSDs. while being responsible for FTL and
Programmable SSDs The ARM SoC inside the board runs flash-management functionalities.
While the concept of in-storage pro- a full-fledged Ubuntu Linux, so pro- Academia and industry are work-
cessing on SSDs was proposed more gramming the board is very similar ing to establish a compelling value
than six years ago,6 experimenting with to programming any other Linux de- proposition by demonstrating appli-
SSD programming has been limited vice. For instance, software can lever- cation scenarios for each of the three
by the availability of real hardware on age the Linux container technology pillars outlined in Figure 4. Among
which a prototype can be built to dem- (such as Docker) to provide isolated them we are initially focused on ex-
onstrate what is possible. The recent environments inside the board. To ploring the benefits and challenges
emergence of prototyping boards avail- create applications running on the of moving compute closer to stor-
able for both research and commercial board, a software development kit age (see Figure 4b) in the context of
purposes has opened new opportuni- (SDK) containing GNU tools to build big data analytics, examining large
ties for application developers to take applications for ARM and user/ker- amounts of data to uncover hidden
ideas from conception to action. nel mode libraries to use the on-chip patterns and insights.
Figure 5 shows such prototype hardware accelerators is provided, al- Big data analytics within a program-
device, called Dragon Fire Card lowing a high level of programmabil- mable SSD. To demonstrate our ap-
(DFC),k,3,5 designed and manufac- ity. The DFC can also serve as a block proach, we have implemented a C++
tured by Dell EMC and NXP for re- device, just like regular SSDs. For this reader that runs on a DFC card (see
search. The card is powerful and purpose, the device is shipped with a Figure 5) for Apache Optimized Row
flash translation layer (FTL) that runs Columnar (ORC) files. The ORC file for-
j https://software.intel.com/en-us/sgx on the main board. mat is designed for fast processing and
k https://github.com/DFC-OpenSource The SSD industry is also moving high storage efficiency of big data ana-
lytic workloads, and has been widely
Figure 6. Preliminary results using a programmable SSD yield approximately 5x speedups adopted in the open source community
for full scans of ZLIB-compressed ORC files within the device, compared to native ORC
readers running on x86 architecture.
and industry. The reader running in-
side the SSD reads large chunks of ORC
streams, decompresses them, and then
evaluates query predicates to find only
300
(million rows/second)
ed library APIs into the reader, enabling node that could be accessed over the
reading data from flash and offloading network through a simple key-value
the decompression work to the ARM store interface provided fault tolerance
SoC hardware accelerator. through replication and application-
Figure 6 shows preliminary band-
width results of scanning a ZLIB-com- The programmable specific processing (such as predicate
evaluations, substring matching and
pressed, single-column integer dataset
(one billion rows) through the C++ ORC
storage substrate decompression) at line rate.
reader running on a host x86 server vs. can be viewed as Datacenter Realization
inside the DFC card, respectively.o As in
the figure, we achieved approximately
a hyper-converged Each application running in cloud
datacenters has its own, unique re-
5x faster scan performance inside the infrastructure quirements, making it difficult to
device compared to running on the
host server. Given that this is a single
where storage, design server nodes with the proper
balance of compute, memory, and
device performance, we should be able networking, storage. To cope with such complex-
to achieve much better performance
improvements by increasing the num- and compute ity, an approach of physically decou-
pling resources was proposed re-
ber of programmable SSDs that are are tightly coupled cently by Han et al.9 in 2013 to allow
used in parallel.
In addition to scanning, filtering, for low-latency, replacing, upgrading, or adding in-
dividual resources instead of the en-
and aggregating large volumes of data
at high-throughput rates by offload-
high-throughput tire node. With the availability of fast
interconnect technologies (such as
ing part of the computation directly to access, while InfiniBand, RDMA, and RoCE), it is al-
the storage has been explored as well.
In 2016 Jo et al.12 built a prototype
still providing ready common in today’s large-scale
cloud datacenters to disaggregate
that performs very early filtering of availability. storage from compute, significantly
data through a combination of ARM reducing the total cost of ownership
and a hardware pattern-matching en- and improving the efficiency of the
gine available inside a programmable storage utilization. However, stor-
SSD equipped with a flow-based pro- age disaggregation is a challenge15
gramming model described by Gu et as storage-media access latencies are
al.7 When a query is given, the query heading toward single-digit microsec-
planner determines whether early ond levelp compared to a disk’s milli-
filtering is beneficial for the query second latency, which is much larger
and chooses a candidate table as the than the fast network overhead. It is
target if the estimated filtering ratio likely that, in the next few years the
is sufficiently high. Early filtering is network latency will become a bottle-
then performed against the target neck as new, emerging non-volatile
table inside the device, and only fil- memories with extremely low laten-
tered data is then fetched to the host cies become available.
for residual computation. This early This challenge of storage disaggre-
filtering inside the device turns out gation can be overcome by using pro-
to be highly effective for analytic grammable storage, enabling a fully
queries; when running all 22 TPC-H programmable storage substrate
queries on a MariaDB server with the that is decoupled from the host sub-
programmable device prototyped on strate as outlined in Figure 7. This
a commodity NVMe SSD, a 3.6x speed- view of storage as a programmable
up was achieved by Jo et al.12 com- substrate allows application devel-
pared to a system with the same SSD opers not only to leverage very low,
without the programmability. storage-medium access latency by
Alternatively, an FPGA-based proto- running programs inside the storage
type design for near-data processing device but also to access any remote
inside the a storage node for database storage device without involving the
engines was studied by István et al.11 in remote host server where the device
2017. In this prototype, each storage is physically attached (see Figure 7) by
JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 61
contributed articles
Engineering
Trustworthy
Systems:
A Principled
Approach to
Cybersecurity
in frequency, severity,
C YBE RATTAC K S A R E IN C R E A S IN G
and sophistication. Target systems are becoming
increasingly complex with a multitude of subtle
dependencies. Designs and implementations
continue to exhibit flaws that could be avoided with
well-known computer-science and engineering
techniques. Cybersecurity technol- in the hundreds of billions of dollars,4
ogy is advancing, but too slowly to erosion of trust in conducting busi-
keep pace with the threat. In short, ness and collaboration in cyberspace,
cybersecurity is losing the escala- and risk of a series of catastrophic
tion battle with cyberattack. The re- events that could cause crippling
sults include mounting damages damage to companies and even entire
countries. Cyberspace is unsafe and is
key insights becoming less safe every day.
The cybersecurity discipline has
Cybersecurity must be practiced as created useful technology against as-
a principled engineering discipline. pects of the expansive space of pos-
Many principles derive from insight into sible cyberattacks. Through many
the nature of how cyberattacks succeed. real-life engagements between cyber-
Defense in depth and breath is required to attackers and defenders, both sides
cover the spectrum of cyberattack classes. have learned a great deal about how to
JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 63
contributed articles
design attacks and defenses. It is now ernments and ways of life though what
time to begin abstracting and codify- is sometimes known by the military as
ing this knowledge into principles of influence operations{24.09}.6
cybersecurity engineering. Such prin- Before launching into the princi-
ciples offer an opportunity to multiply
the effectiveness of existing technol- Students of ples, one more important point needs
to be made: Engineers are responsible
ogy and mature the discipline so that
new knowledge has a solid foundation
cybersecurity for the safety and security of the sys-
tems they build {19.13}. In a conver-
on which to build. must be students sation with my mentor’s mentor, I
Engineering Trustworthy Systems8
contains 223 principles organized into
of cyberattacks once made the mistake of using the
word customer to refer to those using
25 chapters. This article will address and adversarial the cybersecurity systems we were de-
10 of the most fundamental principles
that span several important categories
behavior. signing. I will always remember him
sharply cutting me off and telling me
and will offer rationale and some guid- that they were “clients, not custom-
ance on application of those principles ers.” He said, “Used-car salesmen
to design. Under each primary princi- have customers; we have clients.”
ple, related principles are also includ- Like doctors and lawyers, engineers
ed as part of the discussion. have a solemn and high moral respon-
For those so inclined to read more in sibility to do the right thing and keep
Engineering Trustworthy Systems, after those who use our systems safe from
each stated principle is a reference of the harm to the maximum extent possi-
form “{x.y}” where x is the chapter num- ble, while informing them of the risks
ber in which it appears and y is the y-th they take when using our systems.
principle listed in that chapter (which In The Thin Book of Naming Ele-
are not explicitly numbered in the book). phants,5 the authors describe how the
National Aeronautics and Space Ad-
Motivation ministration (NASA) shuttle-engineer-
Society has reached a point where it is ing culture slowly and unintentionally
inexorably dependent on trustworthy transmogrified from that adhering to a
systems. Just-in-time manufacturing, policy of “safety first” to “better, faster,
while achieving great efficiencies, cheaper.” This change discouraged
creates great fragility to cyberattack, engineers from telling truth to power,
amplifying risk by allowing effects including estimating the actual proba-
to propagate to multiple systems bility of shuttle-launch failure. Manage-
{01.06}. This means that the potential ment needed the probability of launch
harm from a cyberattack is increasing failure to be less than 1 in 100,000 to
and now poses existential threat to in- allow launch. Any other answer was an
stitutions. Cybersecurity is no longer annoyance and interfered with on-time
the exclusive realm of the geeks and and on-schedule launches. In an inde-
nerds, but now must be considered as pendent assessment, Richard Feyn-
an essential risk to manage alongside man found that when engineers were
other major risks to the existence of allowed to speak freely, they calculated
those institutions. the actual failure probability to be 1 in
The need for trustworthy systems 100.5 The engineering cultural failure
extends well beyond pure technology. killed many great and brave souls in
Virtually everything is a system from two separate shuttle accidents.
some perspective. In particular, essen- I wrote Engineering Trustworthy Sys-
tial societal functions such as the mili- tems and this article to help enable and
tary, law enforcement, courts, societal encourage engineers to take full charge
safety nets, and the election process of explicitly and intentionally manag-
are all systems. People and their beliefs ing system risk, from the ground up,
are systems and form a component of in partnership with management and
larger societal systems, such as voting. other key stakeholders.
In 2016, the world saw cyberattacks
transcend technology targets to that of Principles
wetware—human beliefs and propen- It was no easy task to choose only 5%
sity to action. The notion of hacking of the principles to discuss. When in
democracy itself came into light,10 pos- doubt, I chose principles that may be
ing an existential threat to entire gov- less obvious to the reader, to pique cu-
riosity and to attract more computer simply the probability of cyberattacks tant yet subtle aspects of an engineer-
scientists and engineers to this impor- occurring multiplied by the potential ing discipline is understanding how to
tant problem area. The ordering here is damages that would result if they actu- think about it—the underlying attitude
completely different than in the book ally occurred. Estimating both of these that feeds insight. In the same way that
so as to provide a logical flow of the pre- quantities is challenging, but possible. failure motivates and informs depend-
sented subset. Rationale. Engineering disciplines ability principles, cyberattack moti-
Each primary principle includes a require metrics to: “characterize the vates and informs cybersecurity princi-
description of what the principle en- nature of what is and why it is that ples. Ideas on how to effectively defend
tails, a rationale for the creation of the way, evaluate the quality of a system, a system, both during design and oper-
principle, and a brief discussion of the predict system performance under ation, must come from an understand-
implications on the cybersecurity dis- a variety of environments and situa- ing of how cyberattacks succeed.
cipline and its practice. tions, and compare and improve sys- Rationale. How does one prevent at-
˲˲ Cybersecurity’s goal is to optimize tems continuously.”7 Without a met- tacks if one does not know the mecha-
mission effectiveness {03.01}. ric, it is not possible to decide whether nism by which attacks succeed? How
Description. Systems have a primary one system is better than another. does one detect attacks without know-
purpose or mission—to sell widgets, Many fellow cybersecurity engineers ing how attacks manifest? It is not pos-
manage money, control chemical complain that risk is difficult to mea- sible. Thus, students of cybersecurity
plants, manufacture parts, connect peo- sure and especially difficult to quan- must be students of cyberattacks and
ple, defend countries, fly airplanes, and tify, but proceeding without a metric adversarial behavior.
so on. Systems generate mission value is impossible. Thus, doing the hard Implications. Cybersecurity engi-
at a rate that is affected by the probabil- work required to measure risk, with neers and practitioners should take
ity of failure from a multitude of causes, a reasonable uncertainty interval, is courses and read books on ethical
including cyberattack. The purpose of an essential part of the cybersecurity hacking. They should study cyberat-
cybersecurity design is to reduce the discipline. Sometimes, it seems that tack and particularly the post-attack
probability of failure from cyberattack the cybersecurity community spends analysis performed by experts and
so as maximize mission effectiveness. more energy complaining how diffi- published or spoken about at confer-
Rationale. Some cybersecurity en- cult metrics are to create and measure ences such as Black Hat and DEF CON.
gineers mistakenly believe that their accurately, than getting on with creat- They should perform attacks within
goal is to maximize cybersecurity under ing and measuring them. lab environments designed specifi-
a given budget constraint. This exces- Implications. With risk as the pri- cally to allow for safe experimenta-
sively narrow view misapprehends the mary metric, risk-reduction becomes tion. Lastly, when successful attacks
nature of the engineering trade-offs the primary value and benefit from any do occur, cybersecurity analysts must
with other aspects of system design and cybersecurity measure—technological closely study them for root causes and
causes significant frustration among or otherwise. Total cost of cybersecu- the implications to improved com-
the cybersecurity designers, stakehold- rity, on the other hand, is calculated in ponent design, improved operations,
ers in the mission system, and senior terms of the direct cost of procuring, improved architecture, and improved
management (who must often adjudi- deploying, and maintaining the cyber- policy. “Understanding failure is the
cate disputes between these teams). In security mechanism as well as the in- key to success” {07.04}. For example,
reality, all teams are trying to optimize direct costs of mission impacts such the five-whys analysis technique used
mission effectiveness. This realization as performance degradation, delay to by the National Transportation Safety
places them in a collegial rather than market, capacity reductions, and us- Board (NTSB) to investigate aviation
an adversarial relationship. ability. With risk-reduction as a benefit accidents9 is useful to replicate and
Implications. Cybersecurity is always metric and an understanding of total adapt to mining all the useful hard-
in a trade-off with mission functional- costs, one can then reasonably compare earned defense information from the
ity, performance, cost, ease-of-use and alternate cybersecurity approaches in pain of a successful cyberattack.
many other important factors. These terms of risk-reduction return on in- ˲˲ Espionage, sabotage, and influence
trade-offs must be intentionally and vestment. For example, it is often the are goals underlying cyberattack {06.02}.
explicitly managed. It is only in con- case that there are no-brainer actions Description. Understanding adver-
sideration of the bigger picture of op- such as properly configuring existing saries requires understanding their
timizing mission that these trade-offs security mechanisms (for example, fire- motivations and strategic goals. Ad-
can be made in a reasoned manner. walls and intrusion detection systems) versaries have three basic categories
˲˲ Cybersecurity is about understand- that cost very little but significantly re- of goals: espionage—stealing secrets
ing and mitigating risk {02.01}. duce the probability of successful cy- to gain an unearned value or to de-
Description. Risk is the primary met- berattack. Picking such low-hanging stroy value by revealing stolen secrets;
ric of cybersecurity. Therefore, under- fruit should be the first step that any sabotage—hampering operations to
standing the nature and source of risk is organization takes to improving their slow progress, provide competitive ad-
key to applying and advancing the disci- operational cybersecurity posture. vantage, or to destroy for ideological
pline. Risk measurement is foundation- ˲˲ Theories of security come from purposes; and, influence—affecting
al to improving cybersecurity {17.04}. theories of insecurity {02.03}. decisions and outcomes to favor an ad-
Conceptually, cybersecurity risk is Description. One of the most impor- versary’s interests and goals, usually at
JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 65
contributed articles
the expense of those of the defender. portunities for a system design and im-
Rationale. Understanding the stra- plementation to be exposed and sub-
tegic goals of adversaries illuminates verted along its entire life cycle. Early
their value system. A value system sug- development work is rarely protected
gests in which attack goals a potential
adversary might invest most heavily in, It is much better very carefully. System components are
often reused from previous projects or
and perhaps give insight into how they to assume open source. Malicious changes can
easily escape notice during system inte-
adversaries know
will pursue those goals. Different ad-
versaries will place different weights on gration and testing because of the com-
different goals within each of the three
categories. Each will also be willing to
at least as much as plexity of the software and hardware in
modern systems. The maintenance and
spend different amounts to achieve the designer does update phases are also vulnerable to
their goals. Clearly, a nation-state intel-
ligence organization, a transnational
about the system. both espionage and sabotage. The ad-
versary also has an opportunity to
terrorist group, organized crime, a stealthily study a system during opera-
hacktivist and a misguided teenager tion by infiltrating and observing the
trying to learn more about cyberattacks system, learning how the system works
all have very different profiles with re- in reality, not just how it was intended
spect to these goals and their invest- by the designer (which can be signifi-
ment levels. These differences affect cantly different, especially after an ap-
their respective behaviors with respect preciable time in operation). Second,
to different cybersecurity architectures. the potential failure from making too
Implications. In addition to inform- weak of an assumption could be cata-
ing the cybersecurity designer and op- strophic to the system’s mission, where-
erator (one who monitors status and as making strong assumptions merely
controls the cybersecurity subsystem could make the system more expensive.
in real time), understanding attacker Clearly, both probability (driven by op-
goals allows cybersecurity analysts to portunity) and prudence suggest mak-
construct goal-oriented attack trees ing the more conservative assumptions.
that are extraordinarily useful in guid- Implications. The implications of
ing design and operation because they assuming the adversary knows the sys-
give insight into attack probability and tem at least as well as the designers and
attack sequencing. Attack sequencing, operators are significant. This princi-
in turn, gives insight into getting ahead ple means that cybersecurity designers
of attackers at interdiction points with- must spend a substantial amount of
in the attack step sequencing {23.18}. resources: Minimizing the probability
˲˲ Assume your adversary knows your of flaws in design and implementation
system well and is inside it {06.05}. through the design process itself, and
Description. Secrecy is fleeting and performing extensive testing, includ-
thus should never be depended upon ing penetration and red-team testing
more than is absolutely necessary focused specifically on looking at the
{03.05}. This is true of data but ap- system from an adversary perspective.
plies even more strongly with respect The principle also implies a cyberse-
to the system itself {05.11}. It is un- curity engineer must understand the
wise to make rash and unfounded as- residual risks in terms of any known
sumptions that cannot be proven with weaknesses. The design must com-
regard to what a potential adversary pensate for those weaknesses through
may or may not know. It is much safer architecture (for example, specifically
to assume they know at least as much focusing the intrusion detection sys-
as the designer does about the system. tem to monitor possible exploitation of
Beyond adversary knowledge of the sys- those weaknesses), as opposed to hop-
tem, a good designer makes the stron- ing the adversary does not find them
ger assumption that an adversary has because they are “buried too deep”
managed to co-opt at least part of the or, worse yet, because the defender
system sometime during its life cycle. believes that the attacker is “not that
It must be assumed that an adversary sophisticated.” Underestimating the
changed a component to have some de- attacker is hubris. As the saying goes:
gree of control over its function so as to pride comes before the fall {06.04}.
operate as the adversary’s inside agent. Assuming the attacker is (partially)
Rationale. First, there are many op- inside the system requires the designer
JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 67
contributed articles
enemy of cybersecurity because of the of attack, for all attack classes, will be will have for the targeted attack class.
difficulty of arguing that complex sys- equally difficult, and above the cost and Said a different way, the effectiveness of
tems are correct {19.09}. risk thresholds of the attackers. depth could be measured by how miser-
˲˲ Depth without breadth is useless; Implications. This depth-and- able it makes an attacker’s life.
breadth without depth, weak {08.02}. breadth principle implies that the cy- ˲˲ Failing to plan for failure guaran-
Description. Much ado has been bersecurity engineer must have a firm tees catastrophic failure {20.06}.
made about the notion of the concept understanding of the entire spectrum Description. System failures are in-
of defense in depth. The idea is often of cyberattacks, not just a few attacks. evitable {19.01, 19.05}. Pretending
vaguely defined as layering cyberse- More broadly, the principle suggests otherwise is almost always catastroph-
curity approaches including people, the cybersecurity community must de- ic. This principle applies to both the
diverse technology, and procedures to velop better cyberattack taxonomies mission system and cybersecurity
protect systems. Much more precision that capture the entire attack space, subsystem that protects the mission
is needed for this concept to be truly including hardware attacks, device system. Cybersecurity engineers must
useful to the cybersecurity design pro- controller attacks, operating system understand that their systems, like all
cess. Layer how? With respect to what? attacks, and cyberattacks used to af- systems, are subject to failure. It is in-
The unspoken answer is the cyberat- fect the beliefs of people. Further, the cumbent on those engineers to under-
tack space that covers the gamut of all principle also means that cybersecuri- stand how their systems can possibly
possible attack classes as shown in the ty measures must be properly charac- fail, including the failure of the un-
accompanying figure. terized in terms of their effectiveness derlying hardware and other systems
Rationale. One must achieve depth against the various portions of the on which they depend (forexample,
with respect to specified attack classes. cyberattack space. Those who create the microprocessors, the internal sys-
Mechanisms that are useful against or advocate for various measures or tem bus, the network, memory, and
some attack classes are entirely useless solutions will be responsible for creat- external storage systems). A student
against others. This focusing idea fos- ing specific claims about their cyber- of cybersecurity is a student of failure
ters an equally important companion attack-space coverage, and analysts {07.01} and thus a student of depend-
principle: defense in breadth. If a cyber- will be responsible for designing tests ability as a closely related discipline.
security designer creates excellent depth to thoroughly evaluate the validity of Security requires reliability; reliability
to the point of making a particular class those claims. Lastly, cybersecurity requires security {05.09}.
of attack prohibitive to an adversary, the architects will need to develop tech- Rationale. Too many cybersecurity en-
adversary may simply move to an alter- niques for weaving together cyberse- gineers forget that cybersecurity mecha-
native attack. Thus, one must cover the curity in ways that create true depth, nisms are not endowed with magical
breadth of the attack space, in depth. Ideal- measured by how the layers alter the powers of nonfailure. Requirements can
ly, the depth will be such that all avenues probability of success an adversary be ambiguous and poorly interpreted,
designs can be flawed, and implementa-
Defense depth and breadth in a cyberattack. tion errors are no less likely in security
code than in other code. Indeed, secu-
rity code often has to handle complex
timing issues and sometimes needs to
be involved in hardware control. This
involves significantly more complexity
than normal systems and thus requires
Depth = 2 even more attention to failure avoid-
ance, detection, and recovery {05.10}.
Yet the average cybersecurity engineer
today seems inadequately schooled in
Depth = 1
this important related discipline.
Implications. Cybersecurity engineer-
ing requires design using dependabil-
ity engineering principles. This means
that cybersecurity engineers must un-
derstand the nature and cause of faults,
Depth = 3 how the activation of faults lead to er-
rors, which can propagate and cause
Attack space
system failures.1 They must understand
Attack class within the attack space where size this not only with respect to the cyber-
corresponds to number of attacks in the class security system they design, but all the
The subset of attacks classes systems on which the system depends
covered by a security control
and which depend on it, including the
mission system itself.
˲˲ Strategy and tactics knowledge
comes from attack encounters {01.09}. based on this knowledge are some- defenders to autonomic action and
Description. As important as good times called playbooks. They must planning that may eventually be driv-
cybersecurity design is, good cyberse- be developed in advance of attacks en by artificial intelligence. Stronger
curity operations is at least as impor- {23.05} and must be broad enough and stronger cybersecurity measures
tant. Each cybersecurity mechanism is {23.07} to handle a large variety of at- that dynamically adapt to cyberat-
usually highly configurable with hun- tack situations that are likely to occur tacks will similarly lead adversaries
dreds, thousands, and even millions in real-world operations. The process to more intelligent and autonomic
of possible settings (for example, the of thinking through responses to vari- adaptations in their cyberattacks.
rule set of firewalls denying or permit- ous cyberattack scenarios, in itself, The road inevitably leads to machine-
ting each combination, port, protocol, is invaluable in the planning process controlled autonomic action-coun-
source address range, and destination {23.10}. Certain responses that may be teraction and machine-driven adap-
address range). What are the optimal contemplated during this process may tation and evolution of mechanisms.
settings of all of these various mecha- need infrastructure (such as, actuators) This may have surprising and poten-
nisms? The answer depends on varia- to execute the action accurately and tially disastrous results to the system
tions in the mission and variations in quickly enough {23.15} to be effective. called humanity {25.02, 25.04}.
the system environment, including This insight will likely lead to design
attack attempts that may be ongoing. requirements for implementing such Acknowledgments
The settings are part of a trade-off actuators as the system is improved. First and foremost, I acknowledge all
space for addressing the entire spec- of the formative conversations with
trum of attacks. The reality is there The Future my technical mentor, Brian Snow. He
is no static optimal setting for all cy- Systematically extracting, presenting, is a founding cybersecurity intellectual
berattack scenarios under all possible and building the principles underlying who has generously, gently, and wisely
conditions {22.07}. Furthermore, dy- trustworthy systems design is not the guided many minds throughout his il-
namically setting the controls leads to work of one cybersecurity engineer— lustrious career. Second, I thank the
a complex control-feedback problem not by a long shot. The task is difficult, dozens of brilliant cybersecurity engi-
{23.11}. Where does the knowledge daunting, complex, and never-ending. neers and scientists with whom I have
come from regarding how to set the I mean here to present a beginning, had the opportunity to work over the
security control parameters accord- not the last word on the matter. My last three decades. Each has shone a
ing to the particulars of the current goal is to encourage the formation of light of insight from a different direc-
situation? It is extracted from the in- a community of cybersecurity and sys- tion that helped me see the bigger pic-
formation that comes from analyzing tems engineers strongly interested in ture of underlying principles.
cyberattack encounters, both real and maturing and advancing their disci-
simulated, both those that happen to pline so that others may stand on their References
1. Avizienis, A., Laprie, J.-C., and Randell, B. Fundamental
one’s own organization and those that shoulders. This community is served concepts of dependability. In Proceedings of the 3rd
happen to one’s neighbors. by like-minded professionals shar- IEEE Information Survivability Workshop (Boston, MA,
Oct. 24–26). IEEE, 2000, 7–12.
Rationale. There is certainly good ing their thoughts, experiences, and 2. Hamilton, S.N., Miller, W.L., Ott, A., and Saydjari, O.S.
The role of game theory in information warfare.
theory, such as game-theory based results in papers, conferences, and In Proceedings of the 4th Information Survivability
approaches,2 which one can develop over a beverage during informal gath- Workshop. 2001.
3. Hammond, S.A. and Mayfield, A.B. The Thin Book of
about how to control the system ef- erings. My book and this article are a Naming Elephants: How to Surface Undiscussables
fectively (for example, using standard call to action for this community to for Greater Organizational Success. McGraw-Hill, New
York, 2004, 290–292.
control theory). On the other hand, organize and work together toward the 4. Morgan, S. Top 5 Cybersecurity Facts, Figures and
practical experience plays an impor- lofty goal of building the important Statistics for 2018. CSO Online; https://bit.ly/2KG6jJV.
5. NASA. Report of the Presidential Commission on the
tant role in learning how to effectively underpinnings from a systems-engi- Space Shuttle Challenger Accident. June 6, 1986;
defend a system. This knowledge is neering perspective. https://history.nasa.gov/rogersrep/genindex.htm
6. Rand Corporation. Foundations of Effective Influence
called strategy (establishing high-lev- Lastly, I will point out that cyber- Operations: A Framework for Enhancing Army
el goals in a variety of different situ- attack measures and cybersecurity Capabilities. Rand Corp. 2009; https://www.rand.
org/content/dam/rand/pubs/monographs/2009/
ations) and tactics (establishing ef- countermeasures are in an eternal co- RAND_MG654.pdf
fective near-term responses to attack evolution and co-escalation {14.01}. 7. Saydjari, O.S. Why Measure? Engineering Trustworthy
Systems. McGraw-Hill, New York, 2018, 290–292.
steps the adversary takes). Improvements to one discipline 8. Saydjari, O.S. Engineering Trustworthy Systems: Get
Implications. Strategy and tactics Cybersecurity Design Right the First Time. McGraw-
will inevitably create an evolution- Hill Education, 2018.
knowledge must be actively sought, ary pressure on the other. This has 9. Wiegmann, D. and Shappell, S.A. A Human Error
Approach to Aviation Accident Analysis: The Human
collected with intention (through ana- at least two important implications. Factors Analysis and Classification System. Ashgate
lyzing real encounters, performing First, the need to build cybersecu- Publishing, 2003.
10. Zarate, J.C. The Cyber Attacks on Democracy.
controlled experiments, and perform- rity knowledge to build and operate The Catalyst 8, (Fall 2017); https://bit.ly/2IXttZr
ing simulations {23.04}), curated, and trustworthy systems will need contin-
effectively employed in the operations uous and eternal vigilant attention. O. Sami Saydjari (ssaydjari@gmail.com) is Founder and
of a system. Cybersecurity systems Second, communities on both sides President of the Cyber Defense Agency, Inc., Clarksville,
MD, USA.
must be designed to store, communi- need to be careful about where the
cate, and use this knowledge effectively co-evolution leads. Faster and faster Copyright held by author/owner.
in the course of real operations. Plans cyberattacks will lead cybersecurity Publication rights licensed to ACM. $15.00.
JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 69
review articles
DOI:10.1145/ 3282486
most clearly seen in the performance
To trust the behavior of complex AI algorithms, of the latest deep neural network im-
age analysis systems. While their accu-
especially in mission-critical settings, racy at object-recognition on naturally
they must be made intelligible. occurring pictures is extraordinary,
imperceptible changes to input im-
BY DANIEL S. WELD AND GAGAN BANSAL ages can lead to erratic predictions, as
shown in Figure 1. Why are these recog-
The Challenge
nition systems so brittle, making differ-
ent predictions for apparently identical
images? Unintelligible behavior is not
limited to machine learning; many AI
of Crafting
programs, such as automated planning
algorithms, perform search-based look
ahead and inference whose complexity
exceeds human abilities to verify. While
Intelligible
some search and planning algorithms
are provably complete and optimal, in-
telligibility is still important, because
the underlying primitives (for example,
Intelligence
search operators or action descrip-
tions) are usually approximations.29
One can’t trust a proof that is based on
(possibly) incorrect premises.
Despite intelligibility’s apparent
value, it remains remarkably difficult
to specify what makes a system “intel-
ligible.” (We discuss desiderata for in-
telligible behavior later in this article.)
In brief, we seek AI systems where it
is clear what factors caused the sys-
ARTIFICIAL INTELLIGENCE (AI) systems have reached or tem’s action,24 allowing the users to
exceeded human performance for many circumscribed predict how changes to the situation
would have led to alternative behav-
tasks. As a result, they are increasingly deployed in iors, and permits effective control of
mission-critical roles, such as credit scoring, predicting
if a bail candidate will commit another crime, selecting key insights
the news we read on social networks, and self- ˽˽ There are important technical and social
reasons to prefer inherently intelligible
driving cars. Unlike other mission-critical software, AI models (such as linear models
extraordinarily complex AI systems are difficult to or GA2Ms) over deep neural models;
furthermore, intelligible models often
test: AI decisions are context specific and often based have comparable accuracy.
and veracity; in theory, a user can see more, since the nature of explanation
exactly what the model is doing. Unfor- has long been studied by philosophy Why Intelligibility Matters
tunately, interpretable methods may and psychology, these fields should While it has been argued that expla-
not perform as well as more complex also be consulted. nations are much less important than
ones, such as deep neural networks. This article highlights key approaches sheer performance in AI systems, there
Conversely, the approach of mapping and challenges for building intelligible are many reasons why intelligibility is
JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 71
review articles
important. We start by discussing tech- utility function. For example, as Lipton AI may be using inadequate features.
nical reasons, but social factors are im- observed,25 “An algorithm for making Features are often correlated, and
portant as well. hiring decisions should simultane- when one feature is included in a model,
AI may have the wrong objective. ously optimize for productivity, ethics machine learning algorithms extract
In some situations, even 100% perfect and legality.” However, how does one as much signal as possible from it, in-
performance may be insufficient, for express this trade-off? Other examples directly modeling other features that
example, if the performance metric is include balancing training error while were not included. This can lead to
flawed or incomplete due to the diffi- uncovering causality in medicine and problematic models, as illustrated by
culty of specifying it explicitly. Pundits balancing accuracy and fairness in re- Figure 4b (and described later), where
have warned that an automated factory cidivism prediction.12 For the latter, a the ML determined that a patient’s
charged with maximizing paperclip simplified objective function such as prior history of asthma (a lung dis-
production, could subgoal on killing accuracy combined with historically ease) was negatively correlated with
humans, who are using resources that biased training data may cause uneven death by pneumonia, presumably due
could otherwise be used in its task. performance for different groups (for to correlation with (unmodeled) vari-
While this example may be fanciful, it example, people of color). Intelligibil- ables, such as these patients receiving
illustrates that it is remarkably diffi- ity empowers users to determine if an timely and aggressive therapy for lung
cult to balance multiple attributes of a AI is right for the right reasons. problems. An intelligible model helps
humans to spot these issues and cor-
Figure 1. Adding an imperceptibly small vector to an image changes the GoogLeNet39 image rect them, for example, by adding ad-
recognizer’s classification of the image from “panda” to “gibbon.” Source: Goodfellow et al.9
ditional features.4
Distributional drift. A deployed
model may perform poorly in the wild,
that is, when a difference exists be-
tween the distribution which was used
during training and that encountered
during deployment. Furthermore, the
deployment distribution may change
over time, perhaps due to feedback
from the act of deployment. This is
common in adversarial domains, such
“panda” “nematode” “gibbon” as spam detection, online ad pricing,
57.7% confidence 8.2% confidence 99.3% confidence
and search engine optimization. Intel-
ligibility helps users determine when
models are failing to generalize.
Figure 2. Approaches for crafting intelligible AI. Facilitating user control. Many AI
systems induce user preferences from
their actions. For example, adaptive
news feeds predict which stories are
Map to Simpler Model
No likely most interesting to a user. As
Intelligible? • Explanations
• Controls robots become more common and en-
ter the home, preference learning will
become ever more common. If users
Yes understand why the AI performed an
undesired action, they can better issue
instructions that will lead to improved
Use Interact with
Directly Simpler Model
future behavior.
User acceptance. Even if they do not
seek to change system behavior, users
have been shown to be happier with
and more likely to accept algorithmic
Figure 3. The dashed blue shape indicates the space of possible mistakes humans can make. decisions if they are accompanied by
an explanation.18 After being told they
The red shape denotes the AI’s mistakes; should have their kidney removed, it’s
its smaller size indicates a net reduction
in the number of errors. The gray region Human Errors
natural for a patient to ask the doctor
denotes AI-specific mistakes a human why—even if they don’t fully under-
would never make. Despite reducing the AI Errors AI-Specific stand the answer.
Errors
total number of errors, a deployed model Improving human insight. While
may create new areas of liability (gray),
necessitating explanations. improved AI allows automation of
tasks previously performed by hu-
mans, this is not their only use. In ad-
Figure 4. A part of Figure 1 from Caruana et al.4 showing three (of 56 total) components for a GA2M model, which was trained to predict a
patient’s risk of dying from pneumonia.
The two line graphs depict the contribution of individual features to risk: patient’s age, and Boolean variable asthma.
The y-axis denotes its contribution (log odds) to predicted risk. The heat map visualizes the contribution due to
pairwise interactions between age and cancer rate.
0.5
1.2 1.2 100
90 0.4
1 1
80 0.3
0.8 0.8
0.6 0.6 70 0.2
0.4 0.4 60 0.1
0.2 0.2 50
40 0
0 0
30 –0.1
–0.2 –0.2
–0.4 –0.4 20 –0.2
20 30 40 50 60 70 80 90 100 –1 –0.5 0 0.5 1 –1 –0.5 0 0.5 1
(a) Age (b) Asthma (c) Age vs. Cancer
dition, scientists use machine learning absent C, E would not have occurred; cial science research suggests an ex-
to get insight from big data. Medicine furthermore, C should be minimal, planation is best considered a social
offers several examples.4 Similarly, an intuition known to early scientists, process, a conversation between ex-
the behavior of AlphaGo35 has revolu- such as William of Occam, and formal- plainer and explainee.15,30 As a result,
tionized human understanding of the ized by Halpern and Pearl.11 Grice’s rules for cooperative communi-
game. Intelligible models greatly facili- Following this logic, we suggest a cation10 may hold for intelligible expla-
tate these processes. better criterion than simulatability is nations. Grice’s maxim of quality says
Legal imperatives. The European the ability to answer counterfactuals, be truthful, only relating things that
Union’s GDPR legislation decrees citi- aka “what-if” questions. Specifically, are supported by evidence. The maxim
zens’ right to an explanation, and oth- we say that a model is intelligible to of quantity says to give as much in-
er nations may follow. Furthermore, the degree that a human user can pre- formation as is needed, and no more.
assessing legal liability is a growing dict how a change to a feature, for ex- The maxim of relation: only say things
area of concern; a deployed model (for ample, a small increase to its value, that are relevant to the discussion. The
example, self-driving cars) may intro- will change the model’s output and if maxim of manner says to avoid ambi-
duce new areas of liability by causing they can reliably modify that response guity, being as clear as possible.
accidents unexpected from a human curve. Note that if one can simulate the Miller summarizes decades of work
operator, shown as “AI-specific error” model, predicting its output, then one by psychological research, noting that
in Figure 3. Auditing such situations to can predict the effect of a change, but explanations are contrastive, that is,
assess liability requires understanding not vice versa. of the form “Why P rather than Q?”
the model’s decisions. Linear models are especially inter- The event in question, P, is termed the
pretable under this definition because fact and Q is called the foil.30 Often the
Defining Intelligibility they allow the answering of counter- foil is not explicitly stated even though
So far we have treated intelligibility factuals. For example, consider a naive it is crucially important to the expla-
informally. Indeed, few computing re- Bayes unigram model for sentiment nation process. For example, consid-
searchers have tried to formally define analysis, whose objective is to predict er the question, “Why did you predict
what makes an AI system interpre- the emotional polarity (positive or the image depicts an indigo bun-
table, transparent, or intelligible,6 but negative) of a textual passage. Even if ting?” An explanation that points to
one suggested criterion is human sim- the model were large, combining evi- the color blue implicitly assumes the
ulatability:25 Can a human user easily dence from the presence of thousands foil is another bird, such as a chicka-
predict the model’s output for a given of words, one could see the effect of dee. But perhaps the questioner won-
input? By this definition, sparse linear a given word by looking at the sign ders why the recognizer did not pre-
models are more interpretable than and magnitude of the corresponding dict a pair of denim pants; in this case
dense or non-linear ones. weight. This answers the question, a more precise explanation might
Philosophers, such as Hempel and “What if the word had been omitted?” highlight the presence of wings and a
Salmon, have long debated the nature Similarly, by comparing the weights beak. Clearly, an explanation targeted
of explanation. Lewis23 summarizes: associated with two words, one could to the wrong foil will be unsatisfying,
“To explain an event is to provide some predict the effect on the model of sub- but the nature and sophistication of a
information about its causal history.” stituting one for the other. foil can depend on the end user’s ex-
But many causal explanations may ex- Ranking intelligible models. Since pertise; hence, the ideal explanation
ist. The fact that event C causes E is one may have a choice of intelligible will differ for different people.6 For ex-
best understood relative to an imag- models, it is useful to consider what ample, to verify that an ML system is
ined counterfactual scenario, where makes one preferable to another. So- fair, an ethicist might generate more
JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 73
review articles
complex foils than a data scientist. as previous facts have been explained tive contribution decreases risk. For
Most ML explanation systems have re- amplifies this effect.36 example, Figure 4a shows how the pa-
stricted their attention to elucidating tient’s age affects predicted risk. While
the behavior of a binary classifier, that Inherently Intelligible Models the risk is low and steady for young pa-
is, where there is only one possible foil Several AI systems are inherently intel- tients (for example, age < 20), it increas-
choice. However, as we seek to explain ligible, and we previously observed that es rapidly for older patients (age > 67).
multiclass systems, addressing this is- linear models support counterfactual Interestingly, the model shows a sud-
sue becomes essential. reasoning. Unfortunately, linear models den increase at age 86; perhaps a result
Many systems are simply too com- have limited utility because they often of less aggressive care by doctors for
plex to understand without approxi- result in poor accuracy. More expres- patients “whose time has come.” Even
mation. Here, the key challenge is sive choices may include simple deci- more surprising is the sudden drop for
deciding which details to omit. After sion trees and compact decision lists. patients over 100. This might be anoth-
many years of study, psychologists de- To concretely illustrate the benefits of er social effect; once a patient reaches
termined that several criteria can be intelligibility, we focus on Generalized the magic “100,” he or she gets more
prioritized for inclusion in an explana- additive models (GAMs), which are a aggressive care. One benefit of an inter-
tion: necessary causes (vs. sufficient powerful class of ML models that relate pretable model is its ability to highlight
ones); intentional actions (vs. those a set of features to the target using a lin- these issues, spurring deeper analysis.
taken without deliberation); proximal ear combination of (potentially nonlin- Figure 4b illustrates another surpris-
causes (vs. distant ones); details that ear) single-feature models called shape ing aspect of the learned model; appar-
distinguish between fact and foil; and functions.27 For example, if y repre- ently, a history of asthma, a respiratory
abnormal features.30 sents the target and {x1, . . . .xn} repre- disease, decreases the patients risk of
According to Lombrozo, humans sents the features, then a GAM model dying from pneumonia! This finding is
prefer explanations that are simpler takes the form y = β0 + ∑jfj (xj), where the counterintuitive to any physician, who
(that is, contain fewer clauses), more fis denote shape functions and the tar- recognizes that asthma, in fact, should
general, and coherent (that is, consis- get y is computed by summing single- in theory increase such risk. When Ca-
tent with what the human’s prior be- feature terms. Popular shape functions ruana et al. checked the data, they con-
liefs).26 In particular, she observed the include non-linear functions such as cluded the lower risk was likely due to
surprising result that humans pre- splines and decision trees. With linear correlated variables—asthma patients
ferred simple (one clause) explana- shape functions GAMs reduce to a lin- typically receive timely and aggressive
tions to conjunctive ones, even when ear models. GA2M models extend GAM therapy for lung issues. Therefore, al-
the probability of the latter was high- models by including terms for pairwise though the model was highly accurate
er than the former.26 These results interactions between features: on the test set, it would likely fail, dra-
raise interesting questions about the matically underestimating the risk to a
purpose of explanations in an AI sys- patient with asthma who had not been
tem. Is an explanation’s primary pur- previously treated for the disease.
pose to convince a human to accept Facilitating human control of GA2M
the computer’s conclusions (perhaps Caruana et al. observed that for do- models. A domain expert can fix such
by presenting a simple, plausible, mains containing a moderate number erroneous patterns learned by the
but unlikely explanation) or is it to of semantic features, GA2M models model by setting the weight of the
educate the human about the most achieve performance that is competitive asthma term to zero. In fact, GA2Ms let
likely true situation? Tversky, Kahn- with inscrutable models, such as ran- users provide much more comprehen-
eman, and other psychologists have dom forests and neural networks, while sive feedback to the model by using a
documented many cognitive biases remaining intelligible.4 Lou et al. ob- GUI to redraw a line graph for model
that lead humans to incorrect con- served that among methods available for terms.4 An alternative remedy might
clusions; for example, people reason learning GA2M models, the version with be to introduce a new feature to the
incorrectly about the probability of bagged shallow regression tree shape model, representing whether the pa-
conjunctions, with a concrete and viv- functions learned via gradient boosting tient had been recently seen by a pul-
id scenario deemed more likely than achieves the highest accuracy.27 monologist. After adding this feature,
an abstract one that strictly subsumes Both GAM and GA2M are consid- which is highly correlated with asth-
it.16 Should an explanation system ex- ered interpretable because the model’s ma, and retraining, the newly learned
ploit human limitations or seek to learned behavior can be easily under- model would likely reflect that asthma
protect us from them? stood by examining or visualizing the (by itself) increases the risk of dying
Other studies raise an additional contribution of terms (individual or from pneumonia.
complication about how to communi- pairs of features) to the final prediction. There are two more takeaways from
cate a system’s uncertain predictions For example, Figure 4 depicts a GA2M this anecdote. First, the absence of an
to human users. Koehler found that model trained to predict a patient’s risk important feature in the data represen-
simply presenting an explanation for of dying due to pneumonia, showing tation can cause any AI system to learn
a proposition makes people think that the contribution (log odds) to total risk unintuitive behavior for another, corre-
it is more likely to be true.18 Further- for a subset of terms. A positive contri- lated feature. Second, if the learner is in-
more, explaining a fact in the same way bution increases risk, whereas a nega- telligible, then this unintuitive behavior
JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 75
review articles
Figure 5. The intuition guiding LIME’s method for constructing an approximate local model28 is likely a poor global repre-
explanation. Source: Ribeiro et al.33 sentation of f, it is hopefully an accu-
rate local approximation of the
“The black-box model’s complex decision function, f, (unknown to LIME) is represented by the blue/ boundary in the vicinity of the in-
pink background, which cannot be approximated well by a linear model. The bold red cross is the
instance being explained. LIME samples instances, gets predictions using f, and weighs them by the
stance being explained.
proximity to the instance being explained (represented here by size). The dashed line is the learned Ribeiro et al. tested LIME on several
explanation that is locally (but not globally) faithful.” domains. For example, they explained
the predictions of a convolutional neu-
ral network image classifier by con-
verting the pixel-level features into a
smaller set of “super-pixels;” to do so,
they ran an off-the-shelf segmentation
algorithm that identified regions in
the input image and varied the color of
some these regions when generating
“similar” images. While LIME provides
no formal guarantees about its explana-
tions, studies showed that LIME’s ex-
planations helped users evaluate which
of several classifiers best generalizes.
Choice of explanatory vocabulary.
Ribeiro et al.’s use of presegmented
image regions to explain image classi-
fication decisions illustrates the larger
problem of determining an explana-
tory vocabulary. Clearly, it would not
are akin to a doctor explaining specific psychologists and summarized previ- make sense to try to identify the exact
reasons for a patient’s diagnosis rather ously, should guide algorithms that pixel that led to the decision: pixels are
than communicating all of her medi- construct these simplifications. too low level a representation and are
cal knowledge. Contrast this approach Ribeiro et al.’s LIME system33 is a not semantically meaningful to users.
with the global understanding of the good example of a system for generat- In fact, deep neural network’s power
model that one gets with a GA2M model. ing a locally approximate explanatory comes from the very fact that their hid-
Mathematically, one can see a local model of an arbitrary learned model, den layers are trained to recognize la-
explanation as currying—several vari- but it sidesteps part of the question of tent features in a manner that seems
ables in the model are fixed to specific which details to omit. Instead, LIME to perform much better than previous
values, allowing simplification. requires the developer to provide two efforts to define such features indepen-
Generating a local explanation is additional inputs: A set of semantical- dently. Deep networks are inscrutable
a common practice in AI systems. ly meaningful features X′ that can be exactly because we do not know what
For example, early rule-based expert computed from the original features, those hidden features denote.
systems included explanation sys- and an interpretable learning algo- To explain the behavior of such
tems that augmented a trace of the rithm, such as a linear classifier (or a models, however, we must find some
system’s reasoning—for a particular GA2M), which it uses to generate an ex- high-level abstraction over the input
case—with background knowledge.38 planation in terms of the X′. pixels that communicate the model’s
Recommender systems, one of the The insight behind LIME is shown essence. Ribeiro et al.’s decision to use
first deployed uses of machine learn- in Figure 5. Given an instance to ex- an off-the-shelf image-segmentation
ing, also induced demand for expla- plain, shown as the bolded red cross, system was pragmatic. The regions it
nations of their specific recommen- LIME randomly generates a set of selected are easily visualized and carry
dations; the most satisfying answers similar instances and uses the black- some semantic value. However, re-
combined justifications based on box classifier, f, to predict their val- gions are chosen without any regard to
the user’s previous choices, ratings ues (shown as the red crosses and how the classifier makes a decision. To
of similar users, and features of the blue circles). These predictions are explain a blackbox model, where there
items being recommended.32 weighted by their similarity to the in- is no possible access to the classifier’s
Locally approximate explanations. put instance (akin to locally weighted internal representation, there is likely
In many cases, however, even a local regression) and used to train a new, no better option; any explanation will
explanation can be too complex to simpler intelligible classifier, shown lack faithfulness.
understand without approximation. on the figure as the linear decision However, if a user can access the
Here, the key challenge is deciding boundary, using X′, the smaller set classifier and tailor the explanation
which details to omit when creat- of semantic features. The user re- system to it, there are ways to choose
ing the simpler explanatory model. ceives the intelligible classifier as an a more meaningful vocabulary. One
Human preferences, discovered by explanation. While this explanation interesting method jointly trains a
classifier with a natural language, im- Facilitating user control with ex- nation. Furthermore, the concerns of
age-captioning system.13 The classifier planatory models. Generating an ex- a house seeker whose mortgage appli-
uses training data labeled with the ob- planation by mapping an inscrutable cation was denied due to a FICO score
jects appearing in the image; the cap- model into a simpler, explanatory differ from those of a developer or data
tioning system is labeled with English model is only half of the battle. In ad- scientist debugging the system. There-
sentences describing the appearance dition to answering counterfactuals fore, an ideal explainer should model
of the image. By training these systems about the original model, we would the user’s background over the course
jointly, the variables in the hidden lay- ideally be able to map any control ac- of many interactions.
ers may get aligned to semantically tions the user takes in the explanatory The HCI community has long stud-
meaningful concepts, even as they are model back as adjustments to the orig- ied mental models,31 and many intel-
being trained to provide discriminative inal, inscrutable model. For example, ligent tutoring systems (ITSs) build
power. This results in English language as we illustrated how a user could di- explicit models of students’ knowl-
descriptions of images that have both rectly edit a GA2M’s shape curve (Fig- edge and misconceptions.2 However,
high image relevance (from the cap- ure 4b) to change the model’s response the frameworks for these models are
tioning training data) and high class to asthma. Is there a way to interpret typically hand-engineered for each
relevance (from the object recognition such an action, made to an intelligible subject domain, so it may be diffi-
training data), as shown in Figure 6. explanatory model, as a modification cult to adapt ITS approaches to a sys-
While this method works well for to the original, inscrutable model? It tem that aims to explain an arbitrary
many examples, some explanations in- seems unlikely that we will discover a black-box learner.
clude details that are not actually pres- general method to do this for arbitrary Even with an accurate user model,
ent in the image; newer approaches, source models, since the abstraction it is likely that an explanation will not
such as phrase-critic methods, may cre- mapping is not invertible in general. answer all of a user’s concerns, because
ate even better descriptions.14 Another However, there are likely methods for the human may have follow-up ques-
approach might determine if there are mapping backward to specific classes tions. We conclude that an explanation
hidden layers in the learned classifier of source models or for specific types system should be interactive, support-
that learn concepts corresponding to of feature-transform mappings. This is ing such questions from and actions by
something meaningful. For example, an important area for future study. the user. This matches results from psy-
Zeiler and Fergus observed that cer- chology literature, summarized earlier,
tain layers may function as edge or pat- Toward Interactive Explanation and highlights Grice’s maxims, espe-
tern detectors.40 Whenever a user can The optimal choice of explanation de- cially those pertaining to quantity and
identify the presence of such layers, pends on the audience. Just as a hu- relation. It also builds on Lim and Dey’s
then it may be preferable to use them man teacher would explain physics work in ubiquitous computing, which
in the explanation. Bau et al. describe differently to students who know or investigated the kinds of questions us-
an automatic mechanism for match- do not yet know calculus, the technical ers wished to ask about complex, con-
ing CNN representations with seman- sophistication and background knowl- text-aware applications.24 We envision
tically meaningful concepts using a edge of the recipient affects the suit- an interactive explanation system that
large, labeled corpus of objects, parts, ability of a machine-generated expla- supports many different follow-up and
and texture; furthermore, using this
alignment, their method quantitatively Figure 6. A visual explanation taken from Hendricks et al.13
scores CNN interpretability, poten- “Visual explanations are both image relevant and class relevant. In contrast, image descriptions are
tially suggesting a way to optimize for image relevant, but not necessarily class relevant, and class definitions are class relevant but not
intelligible models. necessarily image relevant.”
However, many obstacles remain.
As one example, it is not clear there are Image
Description Laysan Albatross
satisfying ways to describe important,
discriminative features, which are of- Visual
Image Relevance
JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 77
review articles
Figure 7. An example of an interactive explanatory dialog for gaining insight into a DOG/FISH image classifier.
For illustration, the questions and answers are shown in English language text, but our use of a ‘dialog’ is
for illustration only. An interactive GUI, for example, building on the ideas of Krause et al.,20 would likely be
a better realization.
ML Classifier
Green regions argue C: I still predict FISH,
for FISH, while RED because of these green
C: I predict FISH pushes toward DOG. superpixels:
There’s more green.
drill-down actions after presenting a build on tools for interactive machine same issues also confront systems
user with an initial explanation: learning1 and explanatory debug- based on deep-lookahead search.
˲˲ Redirecting the answer by changing ging,20,21 which have explored interac- While many planning algorithms
the foil. “Sure, but why didn’t you pre- tions for adding new training exam- have strong theoretical properties,
dict class C?” ples, correcting erroneous labels in such as soundness, they search over
˲˲ Asking for more detail (that is, a existing data, specifying new features, action models that include their own
more complex explanatory model), and modifying shape functions. As assumptions. Furthermore, goal
perhaps while restricting the explana- mentioned in the previous section, it specifications are likewise incom-
tion to a subregion of feature space. may be challenging to map user adjust- plete.29 If these unspoken assump-
“I’m only concerned about women ments that are made in reference to an tions are incorrect, then a formally
over age 50 ...” explanatory model, back into the origi- correct plan may still be disastrous.
˲˲ Asking for a decision’s rationale. nal, inscrutable model. Consider a planning algorithm
“What made you believe this?” To To make these ideas concrete, Fig- that has generated a sequence of ac-
which the system might respond by dis- ure 7 presents a possible dialog as a tions for a remote, mobile robot. If the
playing the labeled training examples user tries to understand the robust- plan is short with a moderate number
that were most influential in reaching ness of a deep neural dog/fish clas- of actions, then the problem may be
that decision, for example, ones identi- sifier built atop Inception v3.39 As the inherently intelligible, and a human
fied by influence functions19 or nearest figure shows: (1) The computer cor- could easily spot a problem. However,
neighbor methods. rectly predicts the image depicts a fish. larger search spaces could be cogni-
˲˲ Query the model’s sensitivity by (2) The user requests an explanation, tively overwhelming. In these cases,
asking what minimal perturbation to which is provided using LIME.33 (3) The local explanations offer a simplifica-
certain features would lead to a differ- user, concerned the classifier is pay- tion technique that is helpful, just
ent output. ing more attention to the background as it was when explaining machine
˲˲ Changing the vocabulary by add- than to the fish itself, asks to see the learning. The vocabulary issue is like-
ing (or removing) a feature in the ex- training data that influenced the clas- wise crucial: how does one succinctly
planatory model, either from a pre- sifier; the nearest neighbors are com- and abstractly summarize a complete
defined set, by using methods from puted using influence functions.19 search subtree? Depending on the
machine teaching, or with concept While there are anemones in those choice of explanatory foil, different
activation vectors.17 images, it also seems that the system answers are appropriate.8 Sreedharan
˲˲ Perturbing the input example to see is recognizing a clownfish. (4) To gain et al. describe an algorithm for gen-
the effect on both prediction and ex- confidence, the user edits the input erating the minimal explanation that
planation. In addition to aiding under- image to remove the background, re- patches a user’s partial understand-
standing of the model (directly testing submits it to the classifier and checks ing of a domain.37 Work on mixed-ini-
a counterfactual), this action enables the explanation. tiative planning7 has demonstrated
an affected user who wants to contest the importance of supporting inter-
the initial prediction: “But officer, one Explaining Combinatorial Search active dialog with a planning system.
of those prior DUIs was overturned ...?” Most of the preceding discussion Since many AI systems, for example,
˲˲ Adjusting the model. Based on new has focused on intelligible machine AlphaGo,35 combine deep search and
understanding, the user may wish to learning, which is just one type of machine learning, additional chal-
correct the model. Here, we expect to artificial intelligence. However, the lenges will result from the need to ex-
plain interactions between combina- ify an underlying inscrutable model. 18. Koehler, D.J. Explanation, imagination, and confidence
in judgment. Psychological Bulletin 110, 3 (1991), 499.
torics and learned models. Results from psychology show that 19. Koh, P. and Liang, P. Understanding black-box
explanation is a social process, best predictions via influence functions. In ICML, 2017.
20. Krause, J., Dasgupta, A., Swartz, J., Aphinyanaphongs,
Final Thoughts thought of as a conversation. As a re- Y. and Bertini, E. A workflow for visual diagnostics of
In order to trust deployed AI systems, sult, we advocate increased work on binary classifiers using instance-level explanations. In
IEEE VAST, 2017.
we must not only improve their robust- interactive explanation systems that 21. Kulesza, T., Burnett, M., Wong, W. and Stumpf, S.
ness,5 but also develop ways to make support a wide range of follow-up ac- Principles of explanatory debugging to personalize
interactive machine learning. In IUI, 2015.
their reasoning intelligible. Intelligi- tions. To spur rapid progress in this 22. Lakkaraju, H., Kamar, E., Caruana, R. and Leskovec, J.
bility will help us spot AI that makes important field, we hope to see col- Interpretable & explorable approximations of black
box models. KDD-FATML, 2017.
mistakes due to distributional drift or laboration between researchers in 23. Lewis, D. Causal explanation. Philosophical Papers 2
incomplete representations of goals multiple disciplines. (1986), 214–240.
24. Lim, B.Y. and Dey, A.K. Assessing demand for
and features. Intelligibility will also Acknowledgments. We thank E. intelligibility in context-aware applications. In
facilitate control by humans in increas- Proceedings of the 11th International Conference on
Adar, S. Ameshi, R. Calo, R. Caruana, Ubiquitous Computing (2009). ACM, 195–204.
ingly common collaborative human/AI M. Chickering, O. Etzioni, J. Heer, E. 25. Lipton, Z. The Mythos of Model Interpretability.
In Proceedings of ICML Workshop on Human
teams. Furthermore, intelligibility will Horvitz, T. Hwang, R. Kambhamapti, Interpretability in ML, 2016.
help humans learn from AI. Finally, E. Kamar, S. Kaplan, B. Kim, P. Simard, 26. Lombrozo, T. Simplicity and probability in causal
explanation. Cognitive Psychology 55, 3 (2007),
there are legal reasons to want intelli- Mausam, C. Meek, M. Michelson, S. 232–257.
gible AI, including the European GDPR Minton, B. Nushi, G. Ramos, M. Ri- 27. Lou, Y., Caruana, R. and Gehrke, J. Intelligible models
for classification and regression. In KDD, 2012.
and a growing need to assign liability beiro, M. Richardson, P. Simard, J. Suh, 28. Lundberg, S. and Lee, S. A unified approach to
when AI errs. J. Teevan, T. Wu, and the anonymous interpreting model predictions. NIPS, 2017.
29. McCarthy, J. and Hayes, P. Some philosophical
Depending on the complexity of reviewers for helpful conversations and problems from the standpoint of artificial intelligence.
the models involved, two approaches comments. This work was supported in Machine Intelligence (1969), 463–502.
30. Miller, T. Explanation in artificial intelligence: Insights
to enhancing understanding may be part by the Future of Life Institute grant from the social sciences. Artificial Intelligence 267
appropriate: using an inherently in- 2015-144577 (5388) with additional (Feb. 2018), 1–38.
31. Norman, D.A. Some observations on mental models.
terpretable model, or adopting an in- support from NSF grant IIS-1420667, Mental Models, Psychology Press, 2014, 15–22.
scrutably complex model and generat- ONR grant N00014-15-1-2774, and the 32. Papadimitriou, A., Symeonidis, P. and Manolopoulos,
Y. A generalized taxonomy of explanations styles
ing post hoc explanations by mapping WRF/Cable Professorship. for traditional and social recommender systems.
Data Mining and Knowledge Discovery 24, 3 (2012),
it to a simpler, explanatory model 555–583.
through a combination of currying References
33. Ribeiro, M., Singh, S. and Guestrin, C. Why should I
trust you?: Explaining the predictions of any classifier.
and local approximation. When learn- 1. Amershi, S., Cakmak, M., Knox, W. and Kulesza, T.
In KDD, 2016.
Power to the people: The role of humans in interactive
ing a model over a medium number machine learning. AI Magazine 35, 4 (2014), 105–120.
34. Ribeiro, M., Singh, S. and Guestrin, C. Anchors: High-
precision model- agnostic explanations. In AAAI,
of human-interpretable features, one 2. Anderson, J.R., Boyle, F. and Reiser, B. Intelligent
2018.
tutoring systems. Science 228, 4698 (1985), 456–462.
may confidently balance performance 3. Besold, T. et al. Neural-Symbolic Learning and
35. Silver, D. et al. Mastering the game of Go with deep
neural networks and tree search. Nature 529, 7587
and intelligibility with approaches Reasoning: A Survey and Interpretation. CoRR
(2016), 484–489.
abs/1711.03902 (2017). arXiv:1711.03902
like GA2Ms. However, for problems 4. Caruana, R., Lou, Y., Gehrke, J., Koch, P., Sturm, M.
36. Sloman, S. Explanatory coherence and the induction of
properties. Thinking & Reasoning 3, 2 (1997), 81–110.
with thousands or millions of fea- and Elhadad, N. Intelligible models for healthcare:
37. Sreedharan, S., Srivastava, S. and Kambhampati, S.
Predicting pneumonia risk and hospital 30-day
tures, performance requirements readmission. In KDD, 2015.
Hierarchical expertise- level modeling for user specific
robot-behavior explanations. ArXiv e-prints, (Feb.
likely force the adoption of inscru- 5. Dietterich, T. Steps towards robust artificial
2018), arXiv:1802.06895
intelligence. AI Magazine 38, 3 (2017).
table methods, such as deep neural 38. Swartout, W. XPLAIN: A system for creating and
6. Doshi-Velez, F. and Kim, B. Towards a rigorous science
explaining expert consulting programs. Artificial
networks or boosted decision trees. of interpretable machine learning. ArXiv (2017),
Intelligence 21, 3 (1983), 285–325.
arXiv:1702.08608
In these situations, posthoc explana- 39. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J. and
7. Ferguson, G. and Allen, J.F. TRIPS: An integrated
Wojna, Z. Rethinking the inception architecture for
tions may be the only way to facilitate intelligent problem-solving assistant. In AAAI/
computer vision. In CVPR, 2016.
IAAI, 1998.
human understanding. 40. Zeiler, M. and Fergus, R. Visualizing and understanding
8. Fox, M., Long, D. and Magazzeni, D. Explainable
convolutional networks. In ECCV, 2014.
Planning. In IJCAI XAI Workshop, 2017; http://arxiv.
Research on explanation algo- org/abs/1709.10256
rithms is developing rapidly, with 9. Goodfellow, I.J., Shlens, J. and Szegedy, C. 2014.
Explaining and Harnessing Adversarial Examples. Daniel S. Weld (weld@cs.washington.edu) is Thomas
work on both local (instance-specific) ArXiv (2014), arXiv:1412.6572 J. Cable/WRF Professor in the Paul G. Allen School of
explanations and global approxima- 10. Grice, P. Logic and Conversation, 1975, 41–58. Computer Science & Engineering at the University of
11. Halpern, J. and Pearl, J. Causes and explanations: A Washington, Seattle, WA, USA.
tions to the learned model. A key chal- structural-model approach. Part I: Causes. The British Gagan Bansal (bansalg@cs.washington.edu) is a
lenge for all these approaches is the J. Philosophy of Science 56, 4 (2005), 843–887. graduate student in the Paul G. Allen School of Computer
12. Hardt, M., Price, E. and Srebro, N. Equality of Science & Engineering at the University of Washington,
construction of an explanation vocab- opportunity in supervised learning. In NIPS, 2016. Seattle, WA, USA.
ulary, essentially a set of features used 13. Hendricks, L., Akata, Z., Rohrbach, M., Donahue,
J., Schiele, B. and Darrell, T. Generating visual
in the approximate explanation mod- explanations. In ECCV, 2016. Copyright held by authors/owners.
Publishing rights licensed to ACM.
el. Different explanatory models may 14. Hendricks, L.A., Hu, R., Darrell, T. and Akata,
Z. Grounding visual explanations. ArXiv (2017),
be appropriate for different choices of arXiv:1711.06465
explanatory foil, an aspect deserving 15. Hilton, D. Conversational processes and causal
explanation. Psychological Bulletin 107, 1 (1990), 65.
more attention from systems build- 16. Kahneman, D. Thinking, Fast and Slow. Farrar, Straus
and Giroux, New York, 2011; http://a.co/hGYmXGJ Watch the authors discuss
ers. While many intelligible models 17. Kim, B., Wattenberg, M., Gilmer, J., Cai, C., Wexler, this work in the exclusive
can be directly edited by a user, more J., Viegas, F. and Sayres, R. 2017. Interpretability Communications video.
beyond feature attribution: Quantitative testing with https://cacm.acm.org/videos/
research is needed to determine how concept activation vectors. ArXiv e-prints (Nov. 2017); the-challenge-of-crafting-
best to map such actions back to mod- arXiv:stat.ML/1711.11279 intelligible-intelligence
JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 79
Introducing ACM Transactions on Data Science (TDS)
By its very nature, data science overlaps with many areas of computer science.
However, the objective of the journal is to provide a forum for cross-cutting
research results that contribute to data science as defined above. Papers that address
core technologies without clear evidence that they propose multi/cross-disciplinary
technologies and approaches designed for management and processing of large volumes
of data, and for data-driven decision making will be out of scope of this journal.
JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 81
research highlights
DOI:10.1145/ 33 2 3 9 2 1
Technical Perspective
To view the accompanying paper,
visit doi.acm.org/10.1145/3323923 rh
“ YOU M AY FI RE when you are ready, data dependency (an operator can be power) to rediscover parallelism.
Gridley,” is the famous command evaluated when its inputs are avail- The 1970s through early 1990s saw
from Commodore Dewey in the Battle able). We code it in a mainstream several attempts to avoid these “un-
of Manila Bay, 1898. He may not have programming language (C/C++, Py- necessary” sequentializations (green
realized it, but he was articulating the thon, among others), which has com- circles in Figure 2). Dataflow languag-
basic principle of dataflow computing, pletely sequential semantics (zero es (mostly purely functional) and ma-
where an instruction can be executed parallelism) to make sense of reads chine code (dataflow graphs) retained
as soon as its inputs are available. and writes to memory. As illustrated parallelism from the math. Instead of
Dataflow has long fascinated computer in Figure 1, compilers sweat mightily a program counter, each instruction
architects as perhaps a more “natural” to rediscover some of the lost paral- directly named its successor(s) receiv-
way for computation circuits to best ex- lelism in their internal CDFGs (con- ing its outputs. Dataflow CPUs directly
ploit parallelism for performance. trol and data flow graphs), and then executed this graph machine code.
A visiting alien may be forgiven for produce machine code that, again, is Nowadays this computation model
experiencing whiplash when shown completely sequential. When we ex- goes by the acronym EDGE, for explicit
how we treat parallelism in programs. ecute this on a modern von Neumann dataflow graph execution.
Mathematical algorithms have abun- CPU, wide-issue, out-of-order circuits So, why aren’t we all using EDGE
dant parallelism; the only limit is once again sweat mightily (burning machines today? A short answer is
that they never quite mastered spatial
Figure 1. Parallelism during coding, compilation, and execution. or temporal locality and were sub-
par on inherently sequential code re-
gions. In contrast, modern von Neu-
Mathematical
algorithm mann CPUs excel at this, managing
efficient flow of data between circuits
that are fast-and-expensive (registers,
wires), medium (caches), and slow-
Parallelism
and-cheap (DRAMs).
CDFG in The following paper by Tony
compiler
Nowatzki, Vinay Gangadhar, and
Karthikeyan Sankaralingam describes
Out-of-order
CPU an innovative approach to exploit both
Source Machine
code code
von Neumann models. From the CDFG, their compil-
er generates both traditional sequen-
Flow stages tial machine code and a data graph,
each being executed on appropriate
circuits (blue squares in Figure 2), with
efficient hand-off mechanisms. The
Figure 2. Alternative strategies for exploiting parallelism. authors describe extensive studies to
validate the viability of this approach
for existing codes.
Mathematical EDGE computing is undergoing a
algorithm Dataflow language
(Val, Id, Sisal, pH, …) renaissance, with many researchers
Dataflow graph pursuing related ideas. There are in-
machine code
dications that big industry players are
also contemplating this direction.a
Parallelism
Dataflow CPU
CDFG in (EDGE) a Morgan, T.P. Intel’s Exascale dataflow engine
compiler Machine drops x86 and von Neumann. The NEXT Plat-
code
form, Aug 30, 2018.
EDGE
Hybrid CPU
Source Rishiyur S. Nikhil is Chief Technical Officer at
code von Neumann
Bluespec, Inc., a semiconductor tool design company in
Framingham, MA, USA.
Flow stages
Copyright held by author.
JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 83
research highlights
Figure 2. Taking advantage of dynamic program behavior. Figure 3. Von Neumann and dataflow execution models.
(a) Logical Arch. (b) Architecture Preference Over Time (a) Control Flow Graph (b) Original program order
Cache Hierarchy Thousands to Millions of instructions
b a b if c d e h i j
a
App. 1 Basic
OOO if (c) Ideal Schedule
Explicit- App. 2 Blocks
Core
Dataflow
Vector App. 3 c d f
d c e
live-vals Time e g
a h j
h i
trends mean that on-chip power is more limited than area; b i
Control dependence
j removed through
this creates “dark-silicon,” portions of the chip that can- if speculation..
not be kept active due to power constraints. The two major
implications are that energy efficiency is the key to improving (d) Abstract 2-Issue OOO Sched. (e) Abstract Dataflow Sched.
scalable performance, and that it becomes rationale to add Von Neumann enables efficient control spec. Dataflow enables efficient instruction parallelism.
specialized hardware which is only in-use when profitable.
a c d e j a h c
With such a hardware organization, many open ques-
tions arise: Are the benefits of fine-grained interleaving of b if d e
b if h i
execution models significant enough? How might one build
a practical and small footprint dataflow engine capable of i j
serving as an offload engine? Which types of GPP cores can
get substantial benefits? Why are certain program region- OOO gains advantage if instructions Dataflow gains advantage if more
types suitable for explicit-dataflow execution? are added to control-critical path. independent instructions are added.
To answer these questions we make the following contribu-
tions. Most importantly, we identify (and quantify) the poten-
tial of switching between OOO and explicit-dataflow at a fine The performance implications can be seen in an example
grain. Next, we develop a specialization engine for explicit-data- in Figure 3(a), which has a single control decision labeled
flow (SEED) by combining known dataflow-architecture tech- as if . In (b), we show the program instruction order for one
niques, and specializing the design for program characteristics iteration of this code, assuming the left branch was taken.
where explicit-dataflow excels as well as simplifying and com- Figure 3(c) shows the ideal schedule of these instructions on
mon program structures (loops/nested loops). We evaluate the an ideal machine (one instruction per cycle). The key to the
benefits through a design-space exploration, integrating SEED ideal execution is both the reordering of dependent instruc-
into little (in-order), medium (OOO2), and big (OOO4) cores. tions ( c , d ) before the control decision is resolved, as well
Our results demonstrate large energy benefits over >1.5×, and as being able to execute many instructions in parallel.
speedups of 1.67×, 1.33×, and 1.14× across little, medium, and A Von Neumann OOO machine has the advantage of spec-
big cores. Finally, our analysis illuminates the relationship ulative execution, but the disadvantage is the complexity of
between workload properties and dataflow profitability: code implementing hardware for issuing multiple instructions per
with high memory parallelism, instruction parallelism, and cycle (issue width) when the dependences are determined
control noncriticality is highly profitable for dataflow execu- dynamically. Therefore, (d) shows how a dual-issue OOO takes
tion. These are common properties for many emerging work- five cycles because there was not enough issue bandwidth for
loads in machine learning and data processing. both d and h before the third cycle.
A dataflow processor can easily be designed for high issue
2. UNDERSTANDING VON NEUMANN/DATAFLOW width due to dependences being explicitly encoded into the
SYNERGY program representation. However, we assume here that the
Understanding the trade-offs between a Von Neumann dataflow processor does not perform speculation, because
machine, which reorders instructions implicitly, and a data- of the difficulty of recovering when a precise order is not
flow machine, which executes instructions in dependence maintained. Therefore, in Figure 3(e), the dataflow proces-
order, can be subtle. Yet, the trade-offs have profound impli- sor’s schedule, c and d ; must execute after the if .
cations. We attempt to distill the intuition and quantitative Although the example suggests the benefits of control
potential of a heterogeneous core as follows. specialization and wide issue widths are similar, in prac-
tice, the differences can be stark, which we can demonstrate
2.1. Intuition for execution model affinity with slight modifications to the example. If we add several
The intuitive trade-off between the two execution models is instructions to the critical path of the control decision
that explicit-dataflow is more easily specializable for high (between b and if ), the OOO core can hide these through
issue width and instruction window size (due to lack of need control speculation. If instead we add more parallel instruc-
to discover dependences dynamically), whereas an implicit- tions, the explicit-dataflow processor can execute these in
dataflow architecture is more easily specializable for specu- parallel, whereas these may be serialized in the OOO Von
lation (due to its maintenance of precise state of all dynamic Neumann machine. Explicit-dataflow can also be beneficial
instructions in total program order). if the if is unpredictable, and the OOO is anyway serialized.
99%
% Explicit-
75%
86%
...
1.56x
2.5 2x energy
88%
77%
benefit
1.2
dataflow research. For example, Monsoon19 im-
56%
84%
97%
79%
2.0
92%
64%
67%
98%
1.0
64%
33%
73%
46%
99%
57%
33%
1.5
27%
0.8
Explicit Token Store, a concept we borrow for SEED’s
4%
>2x perf.
0%
2%
403.gcc
cjpeg-2
h263enc
djpeg-2
429.mcf
464.h264ref
cjpeg-1
djpeg-1
456.hmmer
jpg2000enc
mpeg2enc
181.mcf
jpg2000dec
mpeg2dec
458.sjeng
473.astar
164.gzip
GMEAN
175.vpr
401.bzip2
256.bzip2
gsmencode
gsmdecode
197.parser
GPP Type:
copro- cessor,12 and the concept of efficient compound
IO2 OOO2 OOO4 FUs from BERET.7
JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 85
research highlights
3.1. Von Neumann core integration through memory) and preserves cache coherence. SEED
Adaptive execution: To adaptively apply explicit-dataflow spe- adds architectural state, which must be maintained at con-
cialization, we use a technique similar to bigLITTLE, except text switches. Lastly, functional units (FUs) could be shared
that we restrict the entry points of specializable regions to with the GPP to save area (by adding bypass paths); this work
fully-inlined loops or nested loops. This simplifies inte- considers stand-alone FUs.
gration with a different ISA. Targeting longer nested-loop
regions reduces the cost of configuration and GPP core 3.2. Dataflow execution model
synchronization. SEED’s execution model closely resembles prior dataflow
GPP integration: SEED uses the same cache hierarchy as architectures, but is restricted for loops/nested-loops, and
the GPP, which facilitates fast switching (no data-copying adds the use of compound instructions.
We use a running example to aid explanation: a simple
linked-list traversal where a conditional computation is
Figure 6. High-level SEED integration and organization (IMU: instruction
performed at each node. Figure 7(a) shows the original pro-
management unit; CFU: compound functional unit; ODU: output
distribution unit). gram, (b) the Von Neumann control flow graph (CFG) repre-
... sentation, and (c) SEED’s explicit-dataflow representation.
Data-dependence: Similar to other dataflow representa-
L1 Cache tions, SEED programs follow the dataflow firing rule: instruc-
SEED Unit 1 SEED Unit 8 tions execute when their operands are ready. To initiate
DCache
ICache
Figure 7. (a) Example C loop; (b) control flow graph (CFG); (c) SEED program representation.
Legend
Live
a
Switch: Forwards a control or a data BLACK →Data line In
S CPU Land
value to one of the two destinations PURPLE →Control line
BLUE →Data from the switch in TRUE path a
Decision Unit: Generates a decision value Subgraph 1
!=0
based on the input RED →Data from the switch in FALSE path
GREY →Token values passed for next iteration
Live + CFU 1
Out 8
L S Memory Units: Interfaces with memory a_next
D T sub-system to load or store values Subgraph 1 mapped to → CFU 1 addr
Subgraph 2 mapped to → CFU 1 L
+ ALUs: Functional Units which perform D
a primitive computation –Add, Multiply, Subgraph 3 mapped to → CFU 2 a_next
8 a/
Shift etc., Subgraph 4 mapped to → CFU 2 To Next iteration
a_next S
!=0
S (loaded a_next)
struct A { struct A a = …
a_next
int v1,v2; v1
+
A* next; Subgraph 2
v2 4
}; a→v2
LD a_next = a→next CFU 1
… next addr L
if (a_next != 0)
Memory D Memory
A* a = … Token n_val n_val Token
Next loop iteration
if (n_val < 0) {
a→v2 = -n_val; n_val = -n_val n_val = n_val + 1
} ST n_val, a→v2 ST n_val, a→v2 +
0
- 1
else { n_val
-nval n_val
S
S
a→v2 = n_val+1; +1
S S
} Subgraph 3 T T Subgraph 4
a→v2 ST addr
} … = a CFU 2
CFU 2
To Store To Store
a) b) Buffer c) Buffer
JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 87
research highlights
Figure 8. Cumulative % contribution for decreasing dynamic region Figure 10. Per-region SEED speedups. Highest-contributing region
lengths. Static region size £ 1024 insts. shown in red.
5
80% 4.5
5.2
% Dynamic Insts Covered
4
70% 3.5
3
60%
2.5
50% 2
1.5
40% 1
Nested-Loops 0.5
30% Inner-Loops
0
gsmencode
djpeg-2
mpeg2dec
197.parser
djpeg-1
gsmdecode
h263enc
h264dec
mpeg2enc
181.mcf
175.vpr
429.mcf
403.gcc
458.sjeng
401.bzip2
cjpeg-1
cjpeg-2
164.gzip
jpg2000dec
jpg2000enc
456.hmmer
473.astar
464.h264ref
256.bzip2
20%
10%
0%
512K
32M
4M
64K
8K
1K
128
16
2
of the OOO processor. Across these regions, indirect mem-
Minimum Allowed Region Duration (in dynamic insts) ory access is common, which precludes SIMD vectorization.
Energy benefit-only regions: These regions have similar
performance to the OOO4, but are more energy efficient by
Figure 9. Compound instruction size histogram. 2×–3×. Here, ILP tends to be lower, but control is mostly off the
critical path, allowing dataflow to compete (e.g., djpeg-1
% of Dynamic Compound Instructions
1.2 +Cons-Cores issue width and instruction window size limits the achiev-
+SEED able ILP (region 3 ), explicit-dataflow processors can exploit
1.0
this through distributed dataflow, as well as more efficient
0.8 0.85× perf.
Core Type: execution under control unpredictability (region 4 ). Beyond
2.3× en. eff. Little (IO2) these region types, dataflow specialization can be applied to
0.6 Medium (OOO2) create engines that target other behaviors, such as repeat-
Big (OOO4) able control 5 , or to further improve highly regular regions
0.4
1.0 1.4 1.8 2.2 by combining dataflow with vector-communication 1 .
Relative Performance Future directions: The disruptive potential of exploiting
common program phase behavior using a heterogeneous
dataflow execution model can have significant implications
and microarchitectural approaches. For the little, medium, and leading to several important directions:
big cores, SEED provides 1.65×, 1.33×, and 1.14× speedup,
and 1.64×, 1.7×, and 1.53× energy efficiency, respectively. • Reduced importance of aggressive out-of-order: Dataflow
The energy benefits come primarily from the prevalence of engines which can exploit high ILP phases can reduce
regions where dataflow execution can match the host core’s the need for aggressive and power-inefficient out-of-order
performance; this occurs 71%, 64%, and 42% of the time, for cores. As a corollary, the design of modest-complexity
the little, medium, and big Von Neumann cores, respectively. loosely coupled cores should in principle be less design
Understanding disruptive trade-offs: Perhaps more inter- effort than a complex OOO core. This could lower the
esting is the disruptive changes that explicit-dataflow spe- cost-of-entry into the general-purpose core market,
cialization introduces for computer architects. First, the increasing competition and spurring innovation.
OOO2+SEED is actually reasonably close in performance • Radical departure from status quo: The simple and mod-
to an OOO4 processor on average, within 15%, while reduc- ular integration of engines targeting different behav-
ing energy 2.3×. Additionally, our estimates suggest that an iors, combined with microarchitecture-level dynamic
OOO2+SEED occupies less area than an OOO4 GPP core. compilation for dataflow ISAs22 can enable such designs
Therefore, a hybrid dataflow system introduces an interest- to be practical. This opens the potential of exploring
ing path toward a high-performance, low-energy micropro- designs with radically different microarchitectures and
cessor: start with an easier-to-engineer modest OOO core, software interfaces, ultimately opening a larger and
and add a simple, nongeneral-purpose dataflow engine. more exciting design space.
An equally interesting trade-off is to add a hybrid data- • An alternative secure processor: An open question is how
flow unit to a larger OOO core—SEED+OOO4 has much to build future secure processors that are immune to
higher energy efficiency (1.54×) with additional perfor- attacks such as Meltdown and Spectre.9 One approach
mance improvements of 1.14×. This is a significant leap for is to simply avoid speculation; this work shows that an
energy-efficiency, especially considering the difficulty of in-order core plus SEED may only lose on average around
improving the efficiency for complex, irregular workloads 20% performance with respect to an OOO core alone, at
such as SpecINT. much lower energy.
Overall, all cores can achieve significant energy benefits,
little and medium cores can achieve significant speedup, 7.2. Accelerators
and big cores receive modest performance improvement. In contrast to general-purpose processors, accelerators are
purpose-built chips integrated at a coarse grain with com-
7. DISCUSSION puting systems, for workloads important-enough to the
Dataflow specialization is a broadly applicable principle for market to justify their design and manufacturing cost. A per-
both general-purpose processors and accelerators. We out- sistent challenge facing accelerator design is that in order to
line our view on the potentially disruptive implications in achieve desired performance and energy efficiency, acceler-
these areas as well as potential future directions. ators often sacrifice generality and programmability, using
application or domain-specific software interfaces. Their
7.1. General purpose cores architecture and microarchitecture is narrowly tailored to
In this work, we showed how a dataflow processor can the particular domain and problem being solved.
more efficiently take over and execute certain phases of The principle of heterogeneous Von Neumann/dataflow
application workloads, based on their properties. This architectures can help to create a highly efficient accelera-
can be viewed visually, as shown in Figure 12, where we tor without having to give up on domain-generality. Inspired
show architecture affinity for programs along dimensions by the insights here, we demonstrated that domain-specific
of control and memory regularity. Figure 12(a) shows how accelerators rely on fundamentally common specialization
prior programmable specialization techniques only focus principles: specialization of computation, communication,
JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 89
research highlights
Figure 12. Program phase affinity by application characteristics. Memory ranges from regular and data-independent, to irregular and
data-dependent but with parallelism, to latency bound with no parallelism. Control can range from noncritical or not present, critical but
repeating, not repeating but predictable, to unpredictable and data-dependent.
Regular
Regular
Access
Access
Explicit-Dataflow (energy+perf)
(perf+energy Streaming
Benefit) (perf+energy
Memory Regularity
Memory Regularity
benefit)
Irregular
Irregular
Access
Access
Out-of-Order
Out-of-
Order
Latency
Latency
Bound
Bound
concurrency, data-reuse, and coordination.14 A dataflow easier to address. Applications for which accelerators
model of computation is especially suitable for exploiting are amenable are generally well-behaved (keeping to a
the first three principles for massive parallel computation, minimum or avoiding pointers, etc.). The execution model
whereas a Von Neumann model excels at the coordination and architecture provides interfaces to cleanly expose the
of control decisions and ordering. We further addressed application’s parallelism and locality to the hardware.
programmable specialization by proposing a Von Neumann/ This opens up exciting opportunities in c ompiler and
dataflow architecture called stream-dataflow,13 which specifies programming languages research to target accelerators.
memory access and communication as streams, enabling
effective specialization of data-reuse in caches and scratch- 8. CONCLUSION
pad memories. This article observed a synergy between Von Neumann and
Future directions: The promise of dataflow specializa- dataflow processors due to variance in program behaviors
tion in the accelerator context is to enable freedom from at a fine grain and used this insight to build a practical pro-
application-specific hardware development, leading to two cessor, SEED. It enables potentially disruptive performance
important future directions. and energy efficiency trade-offs for general-purpose proces-
sors, pushing the boundary of what is possible given only a
• An accelerator architecture: The high energy and area- modestly complex core. This approach of specializing for
efficiency of a Von Neumann/dataflow accelerator, cou- program behaviors using heterogeneous dataflow architec-
pled with a well-defined hardware/software interface, tures could open a new design space, ultimately reducing the
enables the almost paradoxical concept of an accelera- importance of aggressive OOO designs and lead to greater
tor architecture. We envision that a dataflow-special- opportunity for radical architecture innovation.
ized ISA such as stream-dataflow, along with essential
References
hardware specialization principles, can serve as the 1. Arvind, K., Nikhil, R.S. Executing a set customization. In MICRO 37
basis for future innovation for specialization architec- program on the MIT tagged-token Proceedings of the 37th Annual IEEE/
dataflow architecture. IEEE Trans. ACM International Symposium on
tures. Its high efficiency makes it an excellent baseline Comput. 39, 3 (1990), 300–318. Microarchitecture (Portland, Oregon,
comparison design for new accelerators, and the ease 2. Budiu, M., Artigas, P.V., Goldstein S.C. December 04–08, 2004), IEEE
Dataflow: A complement to superscalar. Computer Society, Washington, DC,
of modifying its hardware/software interface can In ISPASS '05 Proceedings of the USA, 30–40.
enable integration of novel forms of computation and IEEE International Symposium on 5. Gebhart, M., Maher, B.A., Coons, K.E.,
Performance Analysis of Systems and Diamond, J., Gratz, P., Marino, M.,
memory specialization for challenging workload Software (March 20–22, 2005) IEEE Ranganathan, N., Robatmili, B.,
domains. Computer Society, Washington, DC, Smith, A., Burrill, J., Keckler, S.W.,
USA, 177–186. Burger, D., McKinley, K.S. An
• Compilation: How a given program leverages Von 3. Burger, D., Keckler, S.W., McKinley, K.S., evaluation of the trips computer
Neumann and dataflow mechanisms can have tremen- Dahlin, M., John, L.K., Lin, C., Moore, C.R., system. In ASPLOS XIV Proceedings
Burrill, J., McDonald, R.G., Yoder, W., of the 14th International Conference
dous influence on attainable efficiency, and some meth- Team, T.T. Scaling to the end of silicon on Architectural Support for
with edge architectures. Computer 37, Programming Languages and
odology is required to navigate this design space. The 7 (July 2004), 44–55. Operating Systems (Washington,
fundamental compiler problem remains extracting 4. Clark, N., Kudlur, M., Park, H., Mahlke, S., DC, USA, March 07–11, 2009), ACM,
Flautner, K. Application-specific New York, NY, USA, 1–12.
and expressing parallelism and locality. The execution processing on a general-purpose 6. Govindaraju, V., Ho, C.-H., Nowatzki, T.,
model and application domains make these problems core via transparent instruction Chhugani, J., Satish, N., Sankaralingam, K.,
JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 91
CAREERS
University of Central Missouri Required Qualifications: Dr. Songlin Tian, Search Committee Chair
Assistant Professor in Computer Science - ˲˲ Ph.D. in Computer Science, Cybersecurity or School of Computer Science and Mathematics
Multiple Positions Software Engineering University of Central Missouri
Warrensburg, MO 64093
The School of Computer Science and Mathemat- ˲˲ Demonstrated ability to teach existing courses (660) 543-4930
ics at the University of Central Missouri is ac- at the undergraduate and/or graduate levels tian@ucmo.edu
cepting applications for three non tenure-track
positions in Computer Science at the rank of ˲˲ Excellent verbal and written communication Initial screening of applications begins May
Assistant Professor. The appointment will begin skills 1, 2019, and continues until position is filled. AA/
August 2019. We are looking for faculty excited by EEO/ADA. Women and minorities are encouraged
the prospect of shaping our school’s future and The Application Process: To apply online, go to apply.
contributing to its sustained excellence. to https://jobs.ucmo.edu. Apply to position UCM is located in Warrensburg, MO, which is
#997335, #997819 or #998560. The following 35 miles southeast of the Kansas City metropolitan
The Position: Duties will include teaching under- items should be attached: a letter of interest, a area. It is a public comprehensive university with
graduate and graduate courses in computer science, curriculum vitae, copies of transcripts, and a list about 13,000 students. The School of Computer
cybersecurity and/or software engineering, and of at least three professional references including Science and Mathematics offers undergraduate
developing new courses depending upon the their names, addresses, telephone numbers and and graduate programs in Computer Science,
expertise of the applicant and school needs, pro- email addresses. Official transcripts and three Cybersecurity and Software Engineering with over
gram accreditation and assessment. Faculty are ex- letters of recommendation will be requested for 700 students. The undergraduate Computer Science
pected to assist with school and university commit- candidates invited for on-campus interview. For and Cybersecurity programs are accredited by the
tee work and service activities, and advising majors. more information, contact: Computing Accreditation Commission of ABET.
Superintendent
Information Technology Division
www.nrl.navy.mil
JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 93
last byte
[ C ONT I N U E D FRO M P. 9 6] keep the people are basically massive analogy- some of the most innovative ideas will
connection alive with people who are try- making machines. They develop these come from people much younger than us.
ing to understand how the brain works. representations quite slowly, and then
HINTON: That said, neuroscientists the representations they develop deter- The progress in the field has been amaz-
are now taking it seriously. For many mine the kinds of analogies they can ing. What would you have been surprised
years, neuroscientists said, “artificial make. Of course, we can do reasoning, to learn was possible 20 or 30 years ago?
neural networks are so unlike the real and we wouldn’t have mathematics LECUN: There’s so much I’ve been
brain, and they’re not going to tell us without it, but it’s not the fundamental surprised by. I was surprised by how
anything about how the brain works.” way we think. late the deep learning revolution was,
Now, neuroscientists are taking seri- but also by how fast it developed once
ously the possibility that something For pioneering researchers, you seem un- it started. I would have expected things
like backpropagation is going on in the usually unwilling to rest on your laurels. to happen more progressively, but peo-
brain, and that’s a very exciting area. HINTON: I think there’s something ple abandoned the whole idea of neu-
LECUN: Almost all the studies now of special about people who invented tech- ral nets between the mid-1990s and
human and animal vision use convolu- niques that are now standard. There mid-2000s. We had evidence that they
tional nets as the standard conceptual was nothing God-given about them, and were working before, but then, once
model. That wasn’t the case until rela- there could well be other techniques that the demonstrations became incon-
tively recently. are better. Whereas people who come to trovertible, the revolution happened
HINTON: I think it’s also going to have a field when there’s already a standard really fast, first in speech recognition,
a huge impact, slowly, on the social sci- way of doing things don’t understand then in image recognition, and now in
ences, because it’s going to change our quite how arbitrary that standard way is. natural language understanding.
PHOTO BY A LEXA NDER BERG
view of what people are. We used to think BENGIO: Students sometimes talk HINTON: I would have been amazed, 20
of people as rational beings, and what about neural nets as if they were de- years ago, if someone had said that you
was special about people was that they scribing the Bible. could take a sentence in one language,
used reasoning to derive conclusions. LECUN: It creates a generation of dog- carve it up into little word fragments,
Now we understand much better that matism. Nevertheless, it’s very likely that feed it into a neural net that starts with
JU N E 2 0 1 9 | VO L. 6 2 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 95
last byte
Q&A
Reaching New Heights
with Artificial Neural Networks
ACM A.M. Turing Award recipients Yoshua Bengio, Geoffrey Hinton, and Yann LeCun
on the promise of neural networks, the need for new paradigms, and the concept of making
technology accessible to all.
ONCE TREATED BY the field with skepticism ble?” In the old days, people in AI made searchers, and we are always impatient
(if not outright derision), the artificial grand claims, and they sometimes for more, because we are far from hu-
neural networks that 2018 ACM A.M. turned out to be just a bubble. But neu- man-level AI, and the dream of under-
Turing Award recipients Geoffrey Hin- ral nets go way beyond promises. The standing the principles of intelligence,
ton, Yann LeCun, and Yoshua Bengio technology actually works. Further- natural or artificial.
spent their careers developing are today more, it scales. It automatically gets
an integral component of everything better when you give it more data and a What isn’t discussed enough?
from search to content filtering. So what faster computer, without anybody hav- HINTON: What does this tell us about
of the now-red-hot field of deep learning ing to write more lines of code. how the brain works? People ask that,
and artificial intelligence (AI)? Here, the YANN LECUN: That’s true. The basic but not enough people are asking that.
three researchers share what they find idea of deep learning is not going away, BENGIO: It’s true. Unfortunately, al-
exciting, and which challenges remain. but it’s still frustrating when people ask though deep learning takes inspiration
if all we need to do to make machines from the brain and from cognition, many
There’s so much more noise now about more intelligent is simply scale our cur- engineers involved with it these days
artificial intelligence than there was rent methods. We need new paradigms. don’t care about those topics. It makes
when you began your careers—some YOSHUA BENGIO: The current tech- sense, because if you’re applying things
of it well-informed, some not. What do niques have many years of industrial in industry, it doesn’t matter. But in
you wish people would stop asking you? and scientific application ahead of terms of research, I think it’s a big loss if
GEOFFREY HINTON: “Is this just a bub- them. That said, the three of us are re- we don’t [C O NTINUED O N P. 94]
An award-winning comprehensive
new textbook, accompanied by
video lectures and extensive
online content
Two of Princeton’s
most popular courses,
now reaching millions
worldwide
introcs.princeton.edu
algs4.princeton.edu