Otto - Testing Operations
‘Sct Ryvola- Software Qualty Manager
Dally Road Testing
Oto trucks drive the highways surrounding San Francisco on a daily basis. The goal of
this testing i to evaluate our (constantly improving) software against real-world driving
situations and identity the current systems strengths and weaknesses.
tt is very deliberate in our testing efforts, We testa al times ofthe day, morning rush
hour trac, throughout the day, and at night to capture every highway situation imaginable.
“Trucks are routinely gent back to known problem areas as wel as into new conditions on a daly
basis. We collect as much information as possible to ensure not only that we wil be able to
handle anything thrown at us but also to use everthing thats thrown at us towards improving
the systam, This is achieved through advanced machine learning models that our perception
‘system mpoys. Atnough not every component ofa sel-diving vehicte can employ @ machine
leaming model, we use problem scenarios encountered onthe road as a baseline of what our
system must be able te handle.
ina typical testing scenario, there are two passengers inside the cabin. One's @
licensed Commercial Drver responsible forthe operation ofthe tuck self geting us to the
highway, exercising active physical monitoring at allies, and taking over fll driving
reeponsibilties when necessary. This diver is constanty monitoring the road and giving input
‘wen nacessary, which is vital to safe operation ofthe vehicle, The co-driver in the passenge
‘seat is reaponsibie for monitoring the output ofthe self diving system and alerting the driver of
‘any anomalies, 2¢ well as instructing when the diver should "disengage" from selt-drving mede.
Drivers can disengage by grabbing the steering wheel, applying the brake, appying thot,
Flipping an “engage” button onthe dash, or Fitting a large red button next to the steering wheel
Iman emergency, any one ofthese inputs instantly restore full driving contrl to the driver
“There are two ways through which the Oto trucks stay on course and within a marked
lane or the freeway. The fists through “seeing the marked lane lines on the freeway using our
percepion system to stay inthe middie ofthe lane. The second is through a hight-detaled cps
‘ap. These maps are created by sending a truck (driven by one of our truck drivers) along a
‘peoed route without engaging the serving system: the goals sticty to colect data about
the specific road and lane conditions. Once a map has been generated for a section of highway,
‘ur trucks can localize to tthe next time they drive that same section of road in se-drving
‘made. By connecting to the national RTK system, we can localize the truck's exec locaton on
the map toa few centimeters of accuracy, These two systems (perception software and map
localization) workin concert with each other and are not mutually excusive.
Before testing begins on any alven day and route, the co-criver wil fist evaluate the
curren road concltions and system readiness before caling for an “engage.” The diver must‘then manually press @ button on the dashboard to engage the sel-driving system, While he
selPctving system is engaged, the diver must be extremely atlentve and ready to take back
“ful control wnenever necessary. In stuations where the diver decides to take back fll centro,
the logging software inthe sell-dring system remains active and records the corrective ection
the diver took, Late, our engineers review these logs to obtain an accurate representaticn of
‘what a human truck driver would do in that particular scenario. A the moment when the driver is
taking over full manual conto, the co-criver wil leave detailed notes describing what was
happening onthe road atthe time, as well as the reason for the particular cisengage.
Dashboard and Log Analysis
‘Once a road-testis complete, we present the collected information to aur engineers via.
‘web deshboard that enables them to see the location of disengages, the comments left by the
‘codrver, and the exact og fie that they can download to review and test new sofware
Upgraces against (see screenshot below, which displays two weeks! worth of testing).
Disengages are vital to evaluating the performance of our system. Each disengage is
reviewed closely after the fact to determine the root cause of the shortcoming and to assign @
‘covey level to each event. All disengagne ara important and taken seriously. but the various
severity levels serve as a signal o our sofware engineers as to which bugs to proniz.Disengage Severt
“The folowing categories are assigned to disengagement events after the logs have been
careful analyzed
‘Comfort - a case where drivers fet uncomfortable withthe decision the self-diving system
‘made but in realty, pose a near-zor0 safely threat to themselves or anyone else on the road
“This category exists because we train our drivers to be extremely cautious and defensive.
‘Public Perception - a case where the behavior ofthe tuck would be considered odd from an
‘onlookers perspective. This sa real sofware bug, but has near-zero safety implications, An
‘example might be minimal weaving within the lane while raveling in slow-moving traffic.
(MBJBR- a scenario that could have had safety implications had there not been driver
intervention. An example would be braking onthe freeway when there was no reason to do so.
“Major scenarios are ones that could have resuted ina safety risk there had been a different
set of actors surrounding the truck; but they would not have actually resulted in acolision or
‘near-collsion even ifthe human drver had not taken back fll control during the partcuar
‘envionment of road conditions and surrounding vehicles when the disengage occured.
RIB - 2 sconario where the truck’ set-in actions put itor any surounding actors in
actual danger that equi the driver to take back ful control at that te. An example coud be
2 stuaton where the criver takes over because the system didnot command enough raking in
torder to stp in ime for alad vice
Triage
Once a disengage event has a severity level assessed, itis assigned toa tracking bug,
‘These bugs capture specific behavioral responses from the tuck without dling dow tothe
‘exact line of code where something went wrong. While deep analysis is undertaken by the
‘responsible engineer, the tracking bugs serve asa marker for us to monitor general system
perfermanes through ach daration of tha software. At ito, we refer to his process of post
‘analyzing the loge a8 “age” to highlight our need te have quick analysis and to shine a
spotight onthe most important issues, During this process, futher triage notes are lft for each
‘event highlighting exactly what occurred onthe road, what the systems response was, and
wat was the root cause of the system’ response.
\When engineers and software quality analysts review logs they doo using our playback.
system nicknamed "DogTV", which isthe same visualization tool that co-drvers use for active
‘monitoring while onthe road (see screenshot below).In DogTV, we can see the video captured during testing overiad with our perception
system output. The perception eystom takes in camera images, radar measurements, a8 wal a8
lidar returns to see the lane and al objects around the truck. For example, inthe image above,
We are outputting the eystem’s interpretation ofthe left lane (red), right lane (green), centerine
(white), and surrounding vehicles (yellow boxes). The camera view i the most interpretable
‘tom an outsiders perspective, but there are many more visualization options. Within DogTV we
have a 3D visualization that shows raw perception and planner output (seen to the right of he
‘camera image above), In this view, engineers can diagnose exactly what the system "saw" 2s
well as what planned todo with that information in terms of trajectory braking response, ek.
“The advantage to this system is that our engineers have the ability to test new code 2n
the exact events we struggled wth inthe past. Sofware simulation of tis nature is @ crucial
‘component fo ensure truck safety onthe road. I gives us the ablty to recreate previous drives
and test now they would play out without having to put a truck back into the same situation, For
instance, i we momentarily lose tracking ot lane when we go under a onage, we can repley
that same issue with diferent versions of software and see which one fixes the issue. The same
process can be applied to not only what we see, but also how we react. The planner team can
{90 back to events where divers reported we braked too much orto litle to ensure the fut
‘commanded braking wil space the truck appropriately‘As much as we woul ike fo write every line of code perfectly onthe frst ry, bugs do
_arige and we are continually updating our software. In order for an engineer to submit @ code
change they first must pass a regression test. This test ensures that tre change wil improve the
softwere and not make i worse than itwas before. An example casei esting that our lane
percestion improves or remains the same as it was before, Ths s accomplished by comparing
the system output ta images where the lane lines have been laboled by humans. These images
‘are re'erred to as ground truth, the change does not increase the deviation from the ground
truth lane ins tis one step closer to being accepted. The final step in submitting a change is to
have he code reviewed by another engineer that aur that section ofthe cade. Once the code
passes the review itis submitted tothe master version of our system
Even though the new code has gone through thorough testing we have another stop
before tis released to aur entre let. We bundle many code changes into “eleases', which are
‘comparable to software versions of other products. Once a release is bundled up we have one
final tet before the software is pushed fo all the trucks. The sofware s pushed to one truck ang
itis sent out on the road todo an inital test run. During this run, the diver and co-drver are
extremely cautious and send 2 report tothe entre software team after the run is complete Ifthe
testing goes wel, the release Is accepted and pushed tothe entire ef the testing does not
{90 well, the code is re-evaluated and the whole process starts over
Colorado Preparations
Bore road testing begun in Colorado, members of our operations team analyzed over
{80 possible routes around the country using satelite imagery to determine the best option for
‘our 100-mile demonstration. Once we determined thatthe route met cur system specifications,
‘an inal ruck was sent out in arly August o confirm our findings. We decided that we were
ready to proceed and immediately added the Colorado logging data into our machine learning
pinelne to tune for the diferences between Colorado highways and thse in Calforia. With this
new information another truck was sent out late August, along with our hea of software and our
lead perception engineer. They validated thatthe 25 route between Fort Coins and Colorado
‘Springs would work very wel fr this demonstration, During this rip our tucks collected the data
necessary for us to make a detalied map ofthe route without having b engage the autonomous
system.
In the meantime, we have been performing hardware upgrades that we will use forthe
‘demanstraton, These upgrades include a new computer, new LIDAR sensors, and a more
rabust wiring harness. We have already tested these upgredes on California roads for several
‘weeks as wel 2s at a privat testing facity called GoMentum Station We have been doing
‘ress testing with afl loaded tale similar tothe one we wil be using forthe actual
commercial delivery in Colorado,[As of now, we have @ tuck in Colorado testing extensively. A driver and co-diver take
the day shit, and they ae replaced by anotner arver ana co-drver forte right shi. These
2-shift rotations will occur every day and night uni the demonstration in October. For this demo,
‘we wil be runring at night, but the day testing gives us more chances tose trafic incase there
is traf the night of our delivery. n addon to our usual testing wth the aperations team, crew
engineers ftom both the Control team and the Perewplon lew Sa Fancisco are fying out to
‘work shits as 20-divers.
Citeria for Executing the Self-Driving Delivery
[As we prepare forthe fst ever self-dving commercial delivery, one key metic we
‘monitors MPI, miles per intervention. This stats, montored by our safety engineers, Ure
‘numberof miles our truck travels autonomously on 8 given route before the driver intervenes for
‘any teason, The requirement to ensure safety is that we must be at 5x the MPI necessary to
‘complete the demonstration. So, for our Fort Colins to Colorado Springs route of 125 mils, We
will not do the demonstration unless we achieve a MPI of 625 miles. This means no dsengages:
cof any kind (Como, Public Perception, Major, or Crticl) for 625 miles or, equivalent, 5
consecutive runs of the serving delvery route without a sisengage.
‘Safety Precautions During the Demonstration
‘To give an extra layer of safety on behalf ofthe national shipper we are working with, we
wil have chase cars and ead cars driven by Otto employees near the truck, monitoring for any
‘Unusual oad condition. This coud include things ike a broken bumper inthe middle ofthe road,
‘8 CSP office pulled over tothe side, or reckless or drunk diver.
{Allo us at Otto want to thank you or working with us on this project. We are very
‘excited forthe next couple weeks and hops thie demonstration wll lp you understana the full
capabltes of our autonomous system.Information for New Entrant Safety Audit Document Submission
‘The below applicable documents must be submited via the Safety Audit Website
ntpi//aifmesa dot gov/newertrant no later than 20 calendar days from the date of ths letter.
‘The website listed above provides mare information for each document requested including
information regarding the type ofcarier operations for whom the document applies.
All carriers must submit the following:
plod = IBRBEEAIE woring forte cartier incuing stand at names, dat of
sith dats of ire, fens numbers andIcense sates
> Untod RESIS owned by the caer along wit sscted uit umes,
is and lates.
‘All carvers must have insurance suficiet to satisfy the minimum publi la
requirements.
‘+ If your carrier operation Is for-hie (paid for transport) upload a copy of aigmied
(e336; on
+ If your eariar apa-ation ie peivate (not pald for transport that hauls hazardous
material, upload acopy of a signed MCS-90.
ty
Applicable Regulation: A9CFR 387.1 387.17
‘The following do not need to be submitted by agricultura/farming related operations that
stay within 150 air miles oftheir place of business. All other operators are expected to submit
the following:
apes ‘CMV drivers ae requred to undergo a physical every two years by a registered
Medical Examiner in order to ensure they are physically and mentally fit to drive, The
eel se aacenneuses
Applicable Regulation: 49CFR 391.51 ~391.55
Carriers must maintan a motor vehicle record for each driver's Driver Qualification file,
Upload copy