You are on page 1of 9

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17)

Concrete Problems for Autonomous Vehicle Safety:


Advantages of Bayesian Deep Learning
Rowan McAllister, Yarin Gal , Alex Kendall, Mark van der Wilk, Amar Shah, Roberto Cipolla,
Adrian Weller
Department of Engineering, University of Cambridge, UK

also Alan Turing Institute, London, UK
{rtm26, yg279, agk34, mv310, as793, rc10001, aw665}@cam.ac.uk

Abstract into a scene prediction component to anticipate other vehi-


cles motions. Finally, decision components transform scene
Autonomous vehicle (AV) software is typically predictions into motor commands. Such components are in-
composed of a pipeline of individual components, creasingly implemented with deep learning tools, as recent
linking sensor inputs to motor outputs. Erroneous engineering advances have dramatically improved the perfor-
component outputs propagate downstream, hence mance of such tools [Kendall and Gal, 2017]. Deep learning
safe AV software must consider the ultimate effect methods supersede previous approaches in an array of tasks,
of each components errors. Further, improving and have become the de facto tools of choice in many ap-
safety alone is not sufficient. Passengers must also plications. Yet building an AV pipeline out of deep learning
feel safe to trust and use AV systems. To address building blocks poses new challenges.
such concerns, we investigate three under-explored
themes for AV research: safety, interpretability, and Difficulties with a pipeline arise when erroneous outputs
compliance. Safety can be improved by quantifying from one component propagate into subsequent components.
the uncertainties of component outputs and propa- For example, in a recent tragic incident, an AV perception
gating them forward through the pipeline. Inter- component confused the white side of a trailer with bright
pretability is concerned with explaining what the sky. This misclassification propagated forwards to the mo-
AV observes and why it makes the decisions it does, tion planning process, resulting in the first fatality of an as-
building reassurance with the passenger. Compli- sisted driving system [NHTSA, 2017]. Such incidents dam-
ance refers to maintaining some control for the pas- age public perception of AV safety, where there is already un-
senger. We discuss open challenges for research derstandable concern [Giffi and Vitale, 2017]. Further, whilst
within these themes. We highlight the need for a major goal for society is safer roads, it is not sufficient for
concrete evaluation metrics, propose example prob- AV software simply to be safe. Passengers need to feel safe in
lems, and highlight possible solutions. order to build trust and use the technology. Passenger appre-
hension can arise from both unfamiliarity and a lack of con-
1 Introduction trol over AV decisions. We therefore investigate three under-
appreciated research themes for the successful adoption of
Autonomous vehicles (AVs) could bring great benefits to fully autonomous vehicles: safety, interpretability, and com-
society, from reducing1 the 1.25 million annual road fatal- pliance (S.I.C. for short).
ities [WHO, 2015] and injuries (100x fatality rate [NSC,
2016]), to providing independence to those unable to drive. The Safety theme concentrates on reliably performing to
Further, AVs offer the AI community many high-impact re- a safe standard across all reasonable contexts. A crucial re-
search problems in diverse fields including: computer vi- quirement for safety is to understand when an AV component
sion; probabilistic modelling; gesture recognition; mapping; faces unfamiliar contexts. Modelling an AVs uncertainty is
pedestrian and vehicle modelling; human-machine interac- a sensible way to measure unfamiliarity, and enables appro-
tion; and multi-agent decision making. priate subsequent decisions to be made under such uncertain-
Whilst great improvements have been made in vehicle ties. Bayesian probability theory is a formal language of un-
hardware and sensor technology, we argue that one barrier to certainty, providing tools to model different explanations of
AV adoption is within software: particularly how to integrate the world. Both interpretability and compliance help reassure
different software components while considering how errors passengers and make them feel safe, as well as to build trust.
propagate through the system. AV systems are typically built Interpretability is concerned with giving insights into the de-
of a pipeline of individual components, linking sensor inputs cisions an AV makes, and the outputs it produces. Such in-
to motor outputs. Raw sensory input is first processed by ob- sights can take a visual form, be text based, or even auditory.
ject detection and localisation components, resulting in scene Two aspects of compliance are of interest: 1) compliance
understanding. Scene understanding can then be inputted with passenger preferences; 2) compliance with the law, reg-
ulations, and societal norms. By complying with passenger
1
Most crashes involve human error [NHTSA, 2008, Tables 9a-c]. preferences (under a safety envelope), passengers can feel re-

4745
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17)

Figure 1: End-to-end Bayesian deep learning architecture. This figure illustrates our architecture and the key benefit of propagating uncer-
tainty throughout the AV framework. We consider a hypothetical situation where an AV (our blue car) approaches an intersection, where
another car (red) will turn left into our path. We compare a framework built on traditional (non-Bayesian) and Bayesian deep learning. Al-
though both systems use the same initial sensory information, propagating uncertainty through the prediction and decision layers allows the
Bayesian approach to avert disaster. Deep learning methods are required for state-of-the-art performance, hence more traditional Bayesian
methods are not possible.

assured they retain high-level situational control. Compliance Bayesian probability theory (in fact, it can be shown that a
with the law, and the integration of regional regulations into rational agent must be Bayesian in its beliefs [Berger, 2013]).
an AV pipeline, is also a crucial requirement which leads to Bayesian methods use probabilities to represent uncertainty,
problems relating to integrating logical rules into deep learn- and can be used for each individual component to represent its
ing systems. Research on such problems is extremely sparse, subjective confidence in its output. This confidence can then
and we comment on this below. be propagated forward through the pipeline. To do so, each
In this document we survey technical challenges and sug- component must be able to input and output probability dis-
gest important paths for future research in S.I.C. from a ma- tributions rather than numbers. Together, component outputs
chine learning research perspective. We highlight the need for downstream become functions of the uncertainty in predic-
concrete evaluation metrics, and suggest new metrics to eval- tions made upstream, enabling a decision layer to consider the
uate safety by looking at both model performance and model consequences of plausible misclassifications made upstream,
uncertainty. We suggest example problems in each of the and act accordingly. Other approaches to conveying uncer-
S.I.C. themes, and highlight possible solutions. tainty exist in the field, including, for example, ensembling
[Gal, 2016]. But the uncertainties of such techniques cannot
2 Safety necessarily be combined together in a meaningful way (a ne-
In AV software, no individual component exists in isola- cessity with a complex software pipeline). Further, the type
tion. Improving safety by reducing individual component of uncertainty captured by these methods is not necessarily
errors is necessary for safe AV softwarebut not sufficient. appropriate in safety applications (discussed further in 5).
Even if each component satisfies acceptable fault tolerances Traditional AV research has used Bayesian tools to capture
in isolation, the accumulation of errors can have disastrous uncertainty in the past [Paden et al., 2016]. But the transi-
consequences. Instead, an understanding of how errors prop- tion of many such systems to deep learning poses a difficulty.
agate forwards through a pipeline of components is critical. How would deep learning systems capture uncertainty?
For example, in Figure 1, a perception componentwhich
detects another vehicles indicator lightsinfluences how a While many Bayesian models exist, deep learning models
prediction component anticipates the vehicles motion, ulti- obtain state-of-the-art perception of fine details and complex
mately influencing the driving decision. Since misclassifica- relationships [Kendall and Gal, 2017]. Hence we propose the
tion by components early in the pipeline affects components use of Bayesian Deep Learning (BDL). BDL is an exciting
downstream, at the very least, each component should pass field lying at the forefront of research. It forms the intersec-
on the limitations and uncertainties of its outputs. Corre- tion between deep learning and Bayesian probability theory,
spondingly, each component should be able to accept as input offering principled uncertainty estimates within deep archi-
the uncertainty of the component preceding it in the pipeline. tectures. These deep architectures can model complex tasks
Further, these component uncertainties must be assembled in by leveraging the hierarchical representation power of deep
a principled way to yield a meaningful measure of overall sys- learning, while also being able to infer complex multi-modal
tem uncertainty, based on which safe decisions can be made. posterior distributions. Bayesian deep learning models typ-
A principled approach to modelling uncertainty is ically form uncertainty estimates by either placing distribu-

4746
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17)

tions over model weights, or by learning a direct mapping outputs they produce. Such insights could be visual (e.g. a
to probabilistic outputs. One pragmatic approach is to use heat map highlighting model uncertainty), text based (e.g.
dropout to approximate a Bernoulli distribution over the neu- asking a system why did you turn left?), or auditory (an
ral networks weights. One can then sample from this net- alarm sounding when the AV does not know what to do, and
work with different dropout masks to obtain the posterior dis- the passenger is required to take control). An AV should be
tribution [Gal, 2016, Chapter 2]. able to explain to a passenger what it observes and why it
Figure 1 illustrates an example where propagating uncer- makes the decisions it does, rather than expecting passengers
tainties throughout the entire pipeline can prevent disaster. It to trust their lives to a black box. Trust increases when a
contrasts standard deep learning with Bayesian deep learning. product is shown to base decisions on environmental aspects
In this case, standard deep learning makes hard predictions, that appear reasonable to a human. Interpretability is useful
whereas Bayesian deep learning outputs probabilistic predic- when a task is hard to define, or when a model is difficult to
tions accounting for each models ignorance about the world. test against the huge variety of possible inputs it could receive
In this scenario, the AV is approaching an intersection which [Doshi-Velez and Kim, 2017].
has an oncoming vehicle whose intention is to turn across We highlight several important goals for interpretability:
the path of our AV. Consider the situation where our model to help passengers trust AV technology by informing pas-
detects the other car, but fails to detect that it is in the turn- sengers what the AV is doing and why,
ing lane or that it is flashing its indicator. Using deep learn- to help society broadly understand and become comfort-
ing, and passing hard classification results through the archi- able with the technology, overcoming a reasonable fear of
tecture, the system would be unable to predict that the on- the unknown,
coming car might turn. In contrast, a Bayesian deep learning
to aid engineers understand the model to validate against
system might still mistake the oncoming cars position and
safety standards,
indicator, but it would be able to propagate its uncertainty
to the next layers. Given the probabilities of the other vehi- accountability for insurance and legal liability by review-
cles existence, indicator lights, and location in the turning ing and explaining decisions if something goes wrong.
lane, the prediction component can now compute the proba- We are particularly interested in interpretability by en-
bility that the oncoming would obstruct the AVs motion. At abling interrogation of the system by the passenger. For ex-
the third stage, the decision component reacts to the range of ample, asking, Why did you suddenly shift left? should
possible predicted movements of other vehicles, resulting in yield a simple intelligible response from the AV such as I
a completely different outcome and averting a crash. shifted left to give the cyclist on our right more space. Such
Methods for representing uncertainty in deep learning are communication could be done through natural-language or
critical for safe AVs, and raise many imperative research visual interfaces. Specifically, we discuss two technical as-
questions. For instance: to improve performance, should one pects to this problem: model saliency and auxiliary outputs.
focus on developing a better uncertainty estimate, or a bet- 3.1 Model Saliency
ter model? These competing hypotheses can be hard to tease Saliency detection methods show which parts of a given
apart. Other important research directions for BDL include image are the most relevant to a given model and task. Such
the development of improved inference approximations (ex- visual interpretations of how machine learning models make
isting approaches can under-estimate model variance through decisions is an exciting new area of AI research.
excessive independence assumptions), and identifying when In deep learning, a few techniques have been proposed.
a model is misspecified (for example if a priori we give high One method uses the derivatives back-propagated through a
probability to incorrect models, and low probability to cor- network to quantify saliency [Simonyan et al., 2013], but
rect models). The accumulation of errors arising from various this often results in low quality pixel-level output. Another
approximations, such as inference approximations as well as technique is visualising the models parameters [Zeiler and
potential model misspecification, is an important area of open Fergus, 2014] or by perturbing the input [Zhou et al., 2014;
research. Ribeiro et al., 2016]. [Dabkowski and Gal, 2017] show how
To assist training and analysing the pipeline as a whole, to learn a model to visualise saliency in real-time. When
we propose that not one, but all components should use BDL applied to autonomous vehicle models, research shows that
tools. This allows for end-to-end training and the combina- driving decisions depend most on the boundaries of other cars
tion of uncertainties through the pipeline, and also simpli- and lane markings immediately ahead [Bojarski et al., 2017].
fies the writing and maintenance of code (compared to a het- Recent work [Zintgraf et al., 2017] adds an interesting new
erogeneous collection of algorithms). Often a modular ap- element: an input item is relevant if it is important in ex-
proach (i.e. not end-to-end) is taken with AVs, developing plaining the prediction and it could not easily be predicted
each component independently, since it is easier to validate by other parts of the input. Such ideas are still in their in-
and unit-test individual components. However, end-to-end fancy, and much work is needed before these could be used
systems can perform better when the components are opti- for real-world systems.
mised jointly, discussed in 5.4. Bayesian deep learning suggests further ways to interpret
saliency. We can understand how perturbations affect the es-
3 Interpretability timated uncertainty in perception, as well as further down the
The next theme we review is interpretability. Interpretable pipeline. Further research is needed to apply these techniques
algorithms give insights into the decisions they make, and the in real-time, and to communicate this effectively to users.

4747
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17)

3.2 Auxiliary Outputs 5 Propagating Uncertainty for Safe Systems


Interpretability is particularly important for end-to-end We next give a concrete example for realising the ideas
systems. One approach to introducing transparency in end- above. We review a typical set of individual components used
to-end pipelines is to examine intermediate, or auxiliary, out- in an AV pipeline, and discuss how to propagate uncertainty
puts. In Section 5 we propose an end-to-end framework to between the individual components. We use this example to
map sensory inputs to control outputs. However, we also ad- highlight concrete problems in S.I.C. AV research.
vocate for human-interpretable auxiliary outputs at each ma- A variety of AV software architectures exist in the literature
jor stage. For example, in addition to the end-to-end driv- [Luettel et al., 2012; Sivaraman and Trivedi, 2013]. However,
ing loss, it would be useful to learn intermediate represen- we refer to the generic pipeline structure used in Figure 1 to
tations for depth, motion, geometry, semantic segmentation guide our discussion on how software components commonly
and prediction components [Kendall et al., 2017]. These out- affect each other. We describe each components ability to
puts may help in training the model by suggesting sensible handle input and output uncertainty. Note that while Bayesian
intermediate representations, and they provide visual output probabilistic methods are already ingrained in most robotics
to help humans understand and debug the model. Existing research to provide probabilistic outputs [Thrun et al., 2005],
literature on this topic is extremely sparse. One challenge is probabilistic inputs have been explored much less.
to achieve effective intermediate outputs which explain the Before we begin, it is important to highlight the two major
models representation while maintaining the flexibility of types of uncertainty we wish to capture with BDL models.
end-to-end learning. First, aleatoric (data dependent) uncertainty models noise
which is inherent in the observations. Aleatoric uncertainty
4 Compliance reflects the difficulty and ambiguity in the input, for example
an over-exposed image or a featureless wall. Second, epis-
The last major theme of research we discuss is compliance. temic (model dependent) uncertainty accounts for ambiguity
In our setting, we suggest there are two important aspects to in the model itself. This uncertainty can arise, for example,
compliance: passenger reassurance, and law abiding. First, from different possible ways of explaining the data. Epis-
consider entering a car driven by someone unfamiliar, whose temic uncertainty can be mitigated given enough data, and
driving habits quickly make you concerned. You might wish is a measure of familiarityhow close a new context is to
the driver would slow down, or give the cyclist ahead more previously observed contexts. An example of what these un-
space. With AVs, such compliance to passengers high-level certainties looks like from a prototype scene understanding
directives could help to reassure them by giving a sense of system is shown in Figure 2 and Figure 3. We next discuss
control [Baronas and Louis, 1988]. Human drivers can adjust architecture pipelines composed of three components: a per-
to non-verbal cues from passengers. If a passenger looks un- ception layer, a prediction layer, and a decision layer.
easy, or perhaps if a passenger is clearly holding something 5.1 Perception Layer
delicate then smoother driving is required. AVs cannot cur- A perception system should be able to solve three impor-
rently observe such cues, therefore passenger requests to an tant tasks: estimate the semantics of the surrounding scene,
AV may be frequent. understand its geometry, and estimate the position of the ve-
hicle itself.
Second, compliance also means meeting the requirements
of the law and societal norms. An AV should only accept pas- Semantics. The role of a semantic scene understanding sys-
senger requests that are safe and legal, hence protecting pas- tem is to classify the class and state of the surroundings based
sengers from liability in the event of a crash. A manufacturer on their appearance. Visual imagery is highly informative but
(or perhaps the software provider) would need to balance the non-trivial for algorithms to understand. Semantic segmen-
flexibility of the operational framework with their own toler- tation is a common task where we wish to segment pixels
ance for liability risk. and classify each segment according to a set of pre-defined
class labels. For example, given a visual input (Figure 2a),
This theme suggests important new research directions. the image is segmented into road, paths, trees, sky, and build-
Since the community has already identified that vehicle-to- ings (Figure 2b). Most algorithms make hard-classifications,
vehicle (V2V) and vehicle-to-infrastructure (V2I) commu- however uncertainty can be captured with semantic segmen-
nication protocols would be useful, perhaps an analogous tation algorithms which use Bayesian deep learning [Kendall
vehicle-to-user (V2U) protocol is also possible. Alterna- et al., 2015].
tively, a passenger might provide feedback to an AV in pur-
Geometry. In addition to semantics, a perception system
suit of a personalized service. But an individual cannot be ex-
must also infer the geometry of the scene. This is important
pected to provide thousands of training examples. It is there-
to establish obstacles and the location of the road surface.
fore important to research methods for data efficient learn-
This problem may be approached with range sensors such as
ing, or more broadly individualised learning. We believe
ultrasound or LIDAR. However, visual solutions have been
that relevant ideas might come from transfer learning or data
demonstrated using supervised [Eigen and Fergus, 2015] and
efficient reinforcement learninglearning to comply with a
unsupervised [Garg et al., 2016] deep learning (see Figure 3).
minimal number of interactions with the passenger. Solutions
to problems arising from this under-explored theme have the Localisation. Finally, a perception system must estimate
potential to transform our current perception of AVs. the vehicles location and motion. In unfamiliar environments

4748
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17)

(a) Sensor input (b) Semantic segmentation (c) Uncertainty


Figure 2: Bayesian deep learning for semantic segmentation. Typically, deep learning models make predictions (b) without considering the
uncertainty (c). Our method proposes to estimate uncertainty (c) from each layer to pass down our Bayesian pipeline. (b) shows semantic
segmentation, where classes are coloured individually. (c) shows uncertainty where colours other than dark blue indicate a pixel is more
uncertain. The model is less confident at class boundaries, distant and unfamiliar objects.

(a) Sensor input (b) Depth prediction (c) Uncertainty


Figure 3: Bayesian deep learning stereo depth estimation algorithm. This example illustrates a common failure case for stereo vision, where
we observe that the algorithm incorrectly handles the reflective and transparent window on the red car. Importantly, we observe that the BDL
algorithm also predicts a high level of uncertainty (c) with this erroneous output (b).

this is commonly referred to as simultaneous localisation and prediction and decision layers (see Figure 1). Uncertainties
mapping (SLAM) [Durrant-Whyte and Bailey, 2006]. It is can be propagated as a set of samples from the perception
beneficial to register the vehicle to a global map and also lo- systems output distribution, for example. However, this ap-
cally to scene features (such as lane position and distance to proach is limited in its use in real-time systems. It is an open
intersection). An AV must infer which lane it is in and where problem to decide on a good representation which captures
it is relative to an oncoming intersection. For this task as well the epistemic uncertainty of the perception system efficiently.
there exist computer vision techniques which use Bayesian 5.2 Prediction Layer
deep learning [Kendall and Cipolla, 2016]. Even though scene perception components can identify ob-
But other problems exist in perception as well, and uncer- stacle locations, many obstacles are dynamic agents such as
tainty plays a central role in these. Roboticists have long con- pedestrians and vehicles. Scene prediction helps an AV antic-
sidered uncertainty in robotic perception. Uncertainty in ge- ipate agent motions to avoid dangerous situations, such as a
ometric features has been considered important for motion vehicle ahead possibly braking suddenly, or a pedestrian pos-
planning [Durrant-Whyte, 1988] and filtering and tracking sibly stepping onto the road. Given a typical LIDAR range
[Kalman, 1960; Julier et al., 1995; Isard and Blake, 1998]. (100m) and legal speed limit (30m/s), predictions are gen-
Another popular use is sensor fusion, given probabilistic erally not made more than 3.3s ahead.
models of each sensor (a likelihood function). Sensor fusion Given a model of external agents behaviour, scene pre-
combines multiple sensor outputs using Bayes rule to con- diction can be framed as a partially observable Markov de-
struct a world-view more confident than any one sensor can cision process (POMDP) [Bandyopadhyay et al., 2013]. In
alone. The benefit is a principled way to combine all avail- the absence of a model, inverse reinforcement learning (IRL)
able information, reducing mapping and localisation errors. can be used [Ziebart et al., 2009]. IRL is the problem of
This way of combining sensors of different types, in particu- inferring another agents latent reward function from obser-
lar LIDAR, radar, visual, and infra-red, helps to overcome the vations [Ng and Russell, 2000]. An inferred reward function
physical limitations of any one individual sensor [Brunner et can then be used in the decision layer to emulate examples of
al., 2013]. human driving for example, called apprenticeship learning
The perceptual component is usually placed at the start of [Abbeel and Ng, 2004], or simply to predict external agents
the pipeline, and has to process raw sensory input. The per- motions. In this method we learn a reward function given
ception components output should consists of a prediction features from a particular scene, aiming to generalise well to
together with uncertainties, which are useful for outputting new scenes [Levine et al., 2010]. Variants of IRL include
more precise object detection (via filtering and tracking) and Bayesian IRL which use Gaussian processes [Levine et al.,
more accurate state estimation (via improved mapping and 2011], as do deep IRL which use neural networks [Wulfmeier
localisation). Evaluating a hard decision should not be made et al., 2015a]. Probabilistic deep methods have been intro-
at the output from the perception component. Instead, the duced recently [Wulfmeier et al., 2015b]. Probabilistic output
uncertainties should be explicitly propagated forward to the predictions of pedestrians or vehicles can be easily visualised,
prediction layer. Most research does not consider uncertainty which is advantageous for debugging and interpretability pur-
beyond this point, but it is a critical input for the follow-up poses. For example, heat maps of state occupancy probabili-

4749
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17)

ties might be used [Ziebart et al., 2009]. Other approaches for and should be anticipated [McAllister et al., 2012]. More re-
prediction try to build an autoregressive model that predicts cently, BDL dynamics models have been used in the design
future segmentations given visual segments from a perception of controllers [Gal et al., 2016; Depeweg et al., 2016], works
layer [Neverova et al., 2017]. Such a method is visually in- which should be extended into MP.
terpretable as well, but inaccurate beyond 0.5s due to lack of 5.4 An End-to-End Learning Paradigm
geometrical knowledge. In an end-to-end learning paradigm we jointly train all
Input uncertainty, however, has not been investigated with components of the system from perception, through predic-
respect to scene prediction, as far as we are aware. Input tion, and ending in decision. This is an attractive paradigm
uncertainty outputted by the perceptual layer in object ex- because it allows each component to form a representation
istence and location would be beneficial for prediction in a which is optimal w.r.t. the desired end task. Alternatively,
complex pipeline. This can be extremely difficult computa- training a single component in isolation provides no signal or
tionally to achieve in real-time. If the output of the prediction knowledge of the end task. For example, training a percep-
layer is a map of occupancy probabilities, the contribution of tion system in isolation might result in a representation which
each object being tracked could be weighted by its probability equally favours accurate segmentation of the clouds above the
of existence, and convolved with the probability distribution AV compared to identifying the vehicle immediately ahead.
function of its location for example. This is an example open End-to-end training would allow the perception system to fo-
problem which has not been addressed by the literature yet. cus its learning on aspects of the scene most useful for the
5.3 Decision Layer end task.
A decision layer commonly includes route planning, mo- However, solely using an end-to-end training signal re-
tion planning, and feedback control components to control the quires large amounts of data. This is because mapping the
AV [Paden et al., 2016]. For each component, we strongly ad- sensory inputs to driving commands is complex. To constrain
vocate a Bayesian decision theoretic approach, with Bayesian the model in a way that learns faster, it is useful to use auxil-
decision theory being a principled methodology to decide on iary losses with each components task. This is also important
actions which maximize a given utility function [Bernardo for interpretability reasons discussed in Section 3. In practice,
and Smith, 2001, Chapter 2]. Further, Bayesian decision the- we might pre-train individual components (such as semantic
ory provides a feedback mechanism in which actions can be segmentation, prediction, etc.) and then fine-tune them using
taken to reduce the agents uncertainty. This feedback is used end-to-end training with auxiliary losses.
to make better informed decisions at a later time (e.g. mov- A criticism of the sequential prediction-decision pipeline
ing to get a better view). This relies on tools developed in architecture is its assumption that the external world evolves
the partially observable Markov Decision processes literature independent to the AVs decisions. Yet surrounding pedes-
[Astrom, 1965; Sondik, 1971]. Two major tasks for the deci- trians and vehicles certainly decide action based on an AVs
sion component is route planning, motion planing. decisions, and vice versa. For example, driving situations
Route Planning that require multi-agent planning include merging lanes, four
ways stop signs, rear-end collision avoidance, giving way,
This task concerns navigation according to an a priori
and avoiding anothers blind spots. Such multi-agent plan-
known graphical structure of the road network, solving the
ning can be solved by coupling prediction and decision com-
minimum-cost path problem using algorithms such as A*.
ponents to jointly predict the external world state and the
Several algorithms exist that are more suited to large net-
conditional AV decisions. Such an architecture avoids bas-
works than A* [Bast et al., 2016]. For many purposes route
ing future decisions on the marginal distribution of future
planning is mostly a solved problem.
external-scene states (which is problematic, since in 3-10s,
Motion Planning the marginal probability distribution could cover the entire
Given which streets to follow, in motion planning (MP) road with non-negligible probabilities of occupancy). Con-
we navigate according to the current road conditions ob- cretely, an approach to jointly reasoning about multi-agent
served by the AVs sensors, taking into account e.g. traf- scenarios is probabilistic model based reinforcement learning
fic light states, pedestrians, and vehicles. Most MP meth- [Deisenroth and Rasmussen, 2011], which has been extended
ods are either sample based path planners or graph search to Bayesian deep learning [Gal et al., 2016].
algorithms [LaValle, 2006]. Classical path planning con-
siders an AVs state space divided into binary free space 6 Metrics
and obstacle classes. Uncertainty in localisation and map- Being able to quantify performance of algorithms against
ping will blur such a free-obstacle distinction, which sev- a set benchmark is critical to developing safe systems. It al-
eral sample based planners consider [Bry and Roy, 2011; lows acknowledgement of incremental development. Addi-
Melchior and Simmons, 2007]. tionally, it provides an ability to validate a system against a
To plan motions well under dynamics uncertainty, an ac- set of safety regulations. Therefore, a set of specific met-
curate probabilistic dynamics model is critical. While kine- rics is needed to score and validate a system. Metrics and
matic models work well for low-inertia non-slip cases, more loss functions can be difficult to specify without unintended
complex models are required to model inertia, and proba- consequences [Amodei et al., 2016]. Therefore, we wish to
bilistic models are needed to model slip well [Paden et al., highlight key considerations when designing metrics for AV
2016]. Modelling dynamics uncertainty is especially relevant classification and regression systems, and specifically under
for safe driving on dirt or gravel where slipping is common a Bayesian framework.

4750
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17)

6.1 Uncertainty Metrics union. However, in an integrated end-to-end system, it is


The core benefit of using a Bayesian framework is be- more important to capture object and obstacle metrics. Ob-
ing able to propagate uncertainty to improve prediction and ject false-positive and false-negative scores are more impor-
model reasoning. Therefore it is imperative that the probabil- tant than precise segmentation boundary delineation and pixel
ity distribution computed by each component accurately re- accuracy. For geometry estimation, it is more important
flects the true uncertainty in the system. Metrics must capture to minimise large failures rather than to improve millime-
the quantitative performance of the uncertainty the model is tre precisionwhich can be achieved by considering thresh-
trying to measure. Typical metrics should include uncertainty old accuracy metrics rather than metric errors. Consideration
calibration plots and precision recall statistics of regression should also be made to develop metrics that reflect the cali-
and classification algorithms. Other task-specific metrics in- bration of perception uncertainty outputs. Sensor fusion is an
clude inspecting test-log-predictive-probabilitiesa measure important process here therefore it is essential for metrics to
of how surprised the system output is on a test-set of known quantify relative uncertainty between sensory inputs.
outcomes. Prediction Layer Metrics
It is most important to accurately quantify the uncertainty
of intermediate representations, such as those from percep- The goal of IRL is to learn a reward function. Exist-
tion and scene prediction components. This is because the ing IRL metrics in the literature use distances between in-
most important aspect is to correctly propagate uncertainty to ferred (and real) functions, e.g. reward functions [Ramachan-
the following layers so that we can make informed decisions. dran and Amir, 2007; Aghasadeghi and Bretl, 2011], value
The final control output will be a hard decision. Therefore functions [Levine et al., 2011; Wulfmeier et al., 2015a;
uncertainty metrics should focus on intermediate components 2015b], or policies [Levine et al., 2010; Abbeel and Ng,
and tasks such as perception, sensor fusion and scene predic- 2004]. However existing metrics in the field are not ap-
tion. propriate for AVs. Instead, we wish to evaluate proba-
bilistic predictions of agents in continuous 2D maps, there-
6.2 End-to-End Metrics
fore we assert that log probabilities [Ziebart et al., 2008;
Under an end-to-end paradigm, ultimately we should test 2009] are a better suited metric for AVs. This is especially the
and validate our model using downstream metrics. Metrics case since motion planning algorithms should be optimised
such as ride comfort, the number of crashes and journey ef- along a similar metrice.g. minimising the probability of
ficiency are what will be used to ultimately judge AVs. Each collision, which is directly related to the probability of a map
component should be judged on its ultimate effects towards cell being occupied or not. Since IRL relies on reward fea-
these metrics, rather than in isolation. Does an improve- tures of scenes to generalise well to new scenes, evaluation
ment in component C by measure M correspond well to an should be done on a new scene to the one the IRL algorithm
improvement in final vehicle performance P? In isolation, a was trained on. A test-set of new scenes should be used to
trained layer might focus on modelling things which do not test 1) how transferable the reward features are, and 2) how
matter to the end task. well the model uses those features [Levine et al., 2011].
Another challenge in quantifying the performance of end-
to-end systems is the attribution of individual components to Decision Layer Metrics
end performance. For example, if two components perform The ultimate aim of AV systems is to drive significantly
much better when combined in an end-to-end system, rather more safely than human drivers. It is therefore not sensible to
than when used individually, how do we quantify their con- evaluate AVs on how well they emulate human drivers. For
tribution? In game theory, this is answered by the Shapley example, several datasets evaluate steering performance by
value [Strumbelj and Kononenko, 2014]. However, refining comparing to human drivers using steering angle RMSE met-
such approaches and applying them to large scale systems is rics. However steering angle RMSE makes little sense in the
challenging. absence of destination inputs. Much worse, this metric makes
6.3 Individual Component Metrics little sense in the absence of the recorded human drivers in-
We recognise that while an end-to-end set of metrics is nec- tentions in choosing intersection and lane changes.
essary to fully evaluate the safety of a system, individual met- Given the probabilistic predictions of external agents such
rics for each module are necessary in practice. This requires as the human driver, a more sensible metric for MP is not
an interpretable system, as the system must expose interme- immediately clear. However, two suggested options include
diary outputs for analysis (and cannot simply learn an end-to- minimising either the probability of a collision or the ex-
end mapping). However, in the robotics and AI communities, pected number of agents collided with, as well as taking into
these metrics often do not reflect the end use of the system. account the ride comfort and distance of the agent from the
For a safe system, these metrics should be both end-goal re- desired final destination at the end of the task. Additionally,
lated and also quantify the uncertainty in each module. To it is desirable to be able to choose paths which bound the
this point, we make some specific suggestions of metrics for probability of collision [Bry and Roy, 2011]. However, the
each component to use during development. probability of collisions cannot be the only factor accounted
Perception Layer Metrics for (since optimising collision safety alone would result in
Currently, low level vision tasks such as semantic seg- the vehicle never moving). A trade-off must be made between
mentation and depth perception focus on improving pixel- safety and the requirements of the vehicle. This trade-off may
wise metrics such as pixel-accuracy or pixel intersection over be adjusted by the user for systems which exhibit compliance.

4751
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17)

6.4 AV Classifications [Astrom, 1965] Karl Astrom. Optimal control of Markov de-
Currently, the accepted standard of AV classification is cision processes with incomplete state estimation. Journal
SAE J3016 [SAE, 2014]. Whilst the standard focuses on the of Mathematical Analysis and Applications, 10(1):174
205, 1965.
degree of human input required, it is also related to the ve- [Bandyopadhyay et al., 2013] Tirthankar Bandyopadhyay,
hicles capabilities. The most advanced class is SAE Level Kok Won, Emilio Frazzoli, David Hsu, Wee Lee, and
5: the full-time performance by an Automated Driving Sys- Daniela Rus. Intention-Aware Motion Planning, pages
tem of all aspects of the dynamic driving task under all road- 475491. Berlin, Heidelberg, 2013.
way and environmental conditions that can be managed by a [Baronas and Louis, 1988] Ann-Marie Baronas and Meryl
human driver. No higher class currently exists. However, Louis. Restoring a sense of control during implementa-
one of the great potentials of AV technology is driving ca- tion: How user involvement leads to system acceptance.
MIS Quarterly, 12(1):111124, 1988.
pabilities that greatly exceed what can be managed by human [Bast et al., 2016] Hannah Bast, Daniel Delling, Andrew
drivers. Human drivers are limited by their observational abil- Goldberg, Matthias Muller-Hannemann, Thomas Pajor,
ities, processing power, and slow reaction times. By contrast, Peter Sanders, Dorothea Wagner, and Renato Werneck.
AVs can access more of the electromagnetic spectrum to bet- Route planning in transportation networks. In Algorithm
ter perceive through rain, fog, dust, smoke, darkness [Brun- Engineering, pages 1980. 2016.
ner et al., 2013], can have more processing power, and much [Berger, 2013] James Berger. Statistical decision theory and
faster reaction times. The development of higher SAE Levels Bayesian analysis. Springer Science & Business Media,
could help encourage AV research to help reach this potential. 2013.
[Bernardo and Smith, 2001] Jose M Bernardo and Adrian
7 Conclusions Smith. Bayesian theory. Measurement Science and Tech-
nology, 12(2):221, 2001.
We believe that autonomous vehicles will bring many ben- [Bojarski et al., 2017] Mariusz Bojarski, Philip Yeres, Anna
efits to society, while presenting important and fascinating Choromanska, Krzysztof Choromanski, Bernhard Firner,
research challenges to the artificial intelligence community. Lawrence Jackel, and Urs Muller. Explaining how a deep
In this paper we highlighted three themes which will be crit- neural network trained with end-to-end learning steers a
car. arXiv preprint arXiv:1704.07911, 2017.
ical for a smooth adoption of AV systems by society. The [Brunner et al., 2013] Christopher Brunner, Thierry Peynot,
first is safety through the use and propagation of uncertainty Teresa Vidal-Calleja, and James Underwood. Selective
from every component throughout the entire system pipeline, combination of visual and thermal imaging for resilient
following a principled Bayesian framework. We discussed localization in adverse conditions: Day and night, smoke
the dangers with making hard decisions with perceptual com- and fire. Journal of Field Robotics, 30(4):641666, 2013.
ponents for example, asserting that soft (uncertain) classifica- [Bry and Roy, 2011] Adam Bry and Nicholas Roy. Rapidly-
tions should be propagated through to the decision layer. This exploring random belief trees for motion planning under
enables the AV to act more cautiously in the event of greater uncertainty. In ICRA, pages 723730, May 2011.
[Dabkowski and Gal, 2017] Piotr Dabkowski and Yarin Gal.
uncertainty. To implement each component in a safe sys- Real time image saliency for black box classifiers. 2017.
tem, we suggested Bayesian deep learning which combines [Deisenroth and Rasmussen, 2011] Marc Deisenroth and
the advantages of the highly flexible deep learning architec- Carl Rasmussen. Pilco: A model-based and data-efficient
tures with Bayesian methods. We also discussed the themes approach to policy search. In ICML, pages 465472,
of interpretability and compliance as ways to build trust and 2011.
mitigate fears which passengers might otherwise reasonably [Depeweg et al., 2016] Stefan Depeweg, Jose Miguel
have about unfamiliar black-box AV systems. Lastly, we dis- Hernandez-Lobato, Finale Doshi-Velez, and Steffen Ud-
luft. Learning and policy search in stochastic dynamical
cussed the importance of clear metrics to evaluate each com- systems with Bayesian neural networks. arXiv preprint
ponents probabilistic output based on their ultimate effect on arXiv:1605.07127, 2016.
the vehicles performance. [Doshi-Velez and Kim, 2017] Finale Doshi-Velez and Been
Kim. A roadmap for a rigorous science of interpretability.
Acknowledgements arXiv preprint arXiv:1702.08608, 2017.
Adrian Weller acknowledges support by the Alan Turing [Durrant-Whyte and Bailey, 2006] Hugh Durrant-Whyte
Institute under the EPSRC grant EP/N510129/1, and by the and Tim Bailey. Simultaneous localization and map-
Leverhulme Trust via the CFI. ping: part i. IEEE robotics & automation magazine,
13(2):99110, 2006.
References [Durrant-Whyte, 1988] Hugh Durrant-Whyte. Uncertain ge-
ometry in robotics. Journal on Robotics and Automation,
[Abbeel and Ng, 2004] Pieter Abbeel and Andrew Ng. Ap- 4(1):2331, 1988.
prenticeship learning via inverse reinforcement learning. [Eigen and Fergus, 2015] David Eigen and Rob Fergus. Pre-
In ICML, page 1, 2004. dicting depth, surface normals and semantic labels with a
[Aghasadeghi and Bretl, 2011] Navid Aghasadeghi and common multi-scale convolutional architecture. In ICCV,
Timothy Bretl. Maximum entropy inverse reinforcement pages 26502658, 2015.
learning in continuous state spaces with path integrals. In [Gal et al., 2016] Yarin Gal, Rowan McAllister, and Carl
IROS, pages 15611566, 2011. Rasmussen. Improving PILCO with Bayesian neural net-
[Amodei et al., 2016] Dario Amodei, Chris Olah, Jacob work dynamics models. In Data-Efficient Machine Learn-
Steinhardt, Paul Christiano, John Schulman, and Dan ing workshop, ICML, 2016.
Mane. Concrete problems in AI safety. arXiv preprint [Gal, 2016] Yarin Gal. Uncertainty in Deep Learning. PhD
arXiv:1606.06565, 2016. thesis, University of Cambridge, 2016.

4752
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17)

[Garg et al., 2016] Ravi Garg, Gustavo Carneiro, and Ian [NSC, 2016] NSC. Motor vehicle fatality estimates. Techni-
Reid. Unsupervised cnn for single view depth estimation: cal report, National Safety Council, Statistics Department,
Geometry to the rescue. In European Conference on Com- Feb 2016.
puter Vision, pages 740756. Springer, 2016. [Paden et al., 2016] Brian Paden, Michal Cap, Sze Zheng
[Giffi and Vitale, 2017] Craig Giffi and Joe Vitale. Fact, fic- Yong, Dmitry Yershov, and Emilio Frazzoli. A sur-
tion and fear cloud future of autonomous vehicles in the vey of motion planning and control techniques for self-
minds of consumers. Technical report, Deloitte, Jan 2017. driving urban vehicles. Transactions on Intelligent Vehi-
[Isard and Blake, 1998] Michael Isard and Andrew Blake. cles, 1(1):3355, 2016.
Condensationconditional density propagation for visual [Ramachandran and Amir, 2007] Deepak Ramachandran
tracking. IJCV, 29(1):528, 1998. and Eyal Amir. Bayesian inverse reinforcement learning.
[Julier et al., 1995] Simon Julier, Jeffrey Uhlmann, and Urbana, 51(61801):14, 2007.
Hugh Durrant-Whyte. A new approach for filtering nonlin- [Ribeiro et al., 2016] Marco Tulio Ribeiro, Sameer Singh,
ear systems. In American Control Conference, volume 3, and Carlos Guestrin. Why should i trust you?: Explain-
pages 16281632, Jun 1995. ing the predictions of any classifier. In KDD, pages 1135
1144, 2016.
[Kalman, 1960] Rudolph Kalman. A new approach to linear [SAE, 2014] SAE. J3016 surfact vehicle information report.
filtering and prediction problems. Journal of Basic Engi- Technical report, Society of Automotive Engineers Inter-
neering, 82(1):3545, 1960. national, Jan 2014. Levels of driving automation.
[Kendall and Cipolla, 2016] Alex Kendall and Roberto [Simonyan et al., 2013] Karen Simonyan, Andrea Vedaldi,
Cipolla. Modelling uncertainty in deep learning for and Andrew Zisserman. Deep inside convolutional
camera relocalization. ICRA, 2016. networks: Visualising image classification models and
[Kendall and Gal, 2017] Alex Kendall and Yarin Gal. What saliency maps. arXiv preprint arXiv:1312.6034, 2013.
uncertainties do we need in Bayesian deep learning for [Sivaraman and Trivedi, 2013] Sayanan Sivaraman and Mo-
computer vision? arXiv preprint arXiv:1703.04977, 2017. han Manubhai Trivedi. Looking at vehicles on the road:
[Kendall et al., 2015] Alex Kendall, Vijay Badrinarayanan, A survey of vision-based vehicle detection, tracking, and
and Roberto Cipolla. Bayesian segnet: Model uncertainty behavior analysis. Transactions on Intelligent Transporta-
in deep convolutional encoder-decoder architectures for tion Systems, 14(4):17731795, Dec 2013.
scene understanding. arXiv preprint arXiv:1511.02680, [Sondik, 1971] Edward Sondik. The optimal control of par-
2015. tially observable Markov processes. Technical report,
[Kendall et al., 2017] Alex Kendall, Yarin Gal, and Roberto 1971.
Cipolla. Multi-task learning using uncertainty to weigh [Strumbelj and Kononenko, 2014] Erik Strumbelj and Igor
losses for scene geometry and semantics. 2017. Kononenko. Explaining prediction models and individ-
[LaValle, 2006] Steven LaValle. Planning algorithms. Cam- ual predictions with feature contributions. Knowledge and
bridge University Press, 2006. information systems, 41(3):647665, 2014.
[Levine et al., 2010] Sergey Levine, Zoran Popovic, and [Thrun et al., 2005] Sebastian Thrun, Wolfram Burgard, and
Vladlen Koltun. Feature construction for inverse reinforce- Dieter Fox. Probabilistic robotics. MIT press, 2005.
ment learning. In NIPS, pages 13421350, 2010. [WHO, 2015] WHO. Global status report on road safety
[Levine et al., 2011] Sergey Levine, Zoran Popovic, and 2015. World Health Organization, Violence and Injury
Vladlen Koltun. Nonlinear inverse reinforcement learning Prevention, 2015.
with Gaussian processes. In NIPS, pages 1927, 2011. [Wulfmeier et al., 2015a] Markus Wulfmeier, Peter On-
[Luettel et al., 2012] Thorsten Luettel, Michael Himmels- druska, and Ingmar Posner. Deep inverse reinforcement
bach, and Hans-Joachim Wuensche. Autonomous ground learning. CoRR, 2015.
vehicles concepts and a path to the future. Proceedings [Wulfmeier et al., 2015b] Markus Wulfmeier, Peter On-
of the IEEE, 100:18311839, May 2012. druska, and Ingmar Posner. Maximum entropy
[McAllister et al., 2012] Rowan McAllister, Thierry Peynot, deep inverse reinforcement learning. arXiv preprint
Robert Fitch, and Salah Sukkarieh. Motion planning and arXiv:1507.04888, 2015.
stochastic control with experimental validation on a plan- [Zeiler and Fergus, 2014] Matthew Zeiler and Rob Fergus.
etary rover. In IROS, pages 47164723, Oct 2012. Visualizing and understanding convolutional networks. In
[Melchior and Simmons, 2007] Nik Melchior and Reid Sim- European conference on computer vision, pages 818833,
mons. Particle RRT for path planning with uncertainty. In 2014.
ICRA, pages 16171624. IEEE, 2007. [Zhou et al., 2014] Bolei Zhou, Aditya Khosla, Agata
Lapedriza, Aude Oliva, and Antonio Torralba. Object
[Neverova et al., 2017] Natalia Neverova, Pauline Luc, detectors emerge in deep scene CNNs. arXiv preprint
Camille Couprie, Jakob Verbeek, and Yann LeCun. Pre- arXiv:1412.6856, 2014.
dicting deeper into the future of semantic segmentation. [Ziebart et al., 2008] Brian Ziebart, Andrew Maas, Andrew
arXiv preprint arXiv:1703.07684, 2017. Bagnell, and Anind Dey. Maximum entropy inverse rein-
[Ng and Russell, 2000] Andrew Ng and Stuart Russell. Al- forcement learning. In AAAI, volume 8, pages 14331438,
gorithms for inverse reinforcement learning. In ICML, 2008.
pages 663670, 2000. [Ziebart et al., 2009] Brian Ziebart, Nathan Ratliff, Garratt
[NHTSA, 2008] NHTSA. Dot hs 811 059: National mo- Gallagher, Christoph Mertz, Kevin Peterson, Andrew Bag-
tor vehicle crash causation survey. Technical report, U.S. nell, Martial Hebert, Anind Dey, and Siddhartha Srinivasa.
Department of Transportation, National Highway Traffic Planning-based prediction for pedestrians. In IROS, pages
Safety Administration, Jul 2008. 39313936, 2009.
[Zintgraf et al., 2017] Luisa Zintgraf, Taco Cohen, Tameem
[NHTSA, 2017] NHTSA. PE 16-007. Technical report, U.S. Adel, and Max Welling. Visualizing deep neural network
Department of Transportation, National Highway Traffic decisions: Prediction difference analysis. arXiv preprint
Safety Administration, Jan 2017. Tesla Crash Preliminary arXiv:1702.04595, 2017.
Evaluation Report.

4753

You might also like