Otto Testing Operations

Otto - Testing Operations ‘Sct Ryvola- Software Qualty Manager Dally Road Testing Oto trucks drive the highways surrounding San Francisco on a daily basis. The goal of this testing i to evaluate our (constantly improving) software against real-world driving situations and identity the current systems strengths and weaknesses. tt is very deliberate in our testing efforts, We testa al times ofthe day, morning rush hour trac, throughout the day, and at night to capture every highway situation imaginable. “Trucks are routinely gent back to known problem areas as wel as into new conditions on a daly basis. We collect as much information as possible to ensure not only that we wil be able to handle anything thrown at us but also to use everthing thats thrown at us towards improving the systam, This is achieved through advanced machine learning models that our perception ‘system mpoys. Atnough not every component ofa sel-diving vehicte can employ @ machine leaming model, we use problem scenarios encountered onthe road as a baseline of what our system must be able te handle. ina typical testing scenario, there are two passengers inside the cabin. One's @ licensed Commercial Drver responsible forthe operation ofthe tuck self geting us to the highway, exercising active physical monitoring at allies, and taking over fll driving reeponsibilties when necessary. This diver is constanty monitoring the road and giving input ‘wen nacessary, which is vital to safe operation ofthe vehicle, The co-driver in the passenge ‘seat is reaponsibie for monitoring the output ofthe self diving system and alerting the driver of ‘any anomalies, 2¢ well as instructing when the diver should "disengage" from selt-drving mede. Drivers can disengage by grabbing the steering wheel, applying the brake, appying thot, Flipping an “engage” button onthe dash, or Fitting a large red button next to the steering wheel Iman emergency, any one ofthese inputs instantly restore full driving contrl to the driver “There are two ways through which the Oto trucks stay on course and within a marked lane or the freeway. The fists through “seeing the marked lane lines on the freeway using our percepion system to stay inthe middie ofthe lane. The second is through a hight-detaled cps ‘ap. These maps are created by sending a truck (driven by one of our truck drivers) along a ‘peoed route without engaging the serving system: the goals sticty to colect data about the specific road and lane conditions. Once a map has been generated for a section of highway, ‘ur trucks can localize to tthe next time they drive that same section of road in se-drving ‘made. By connecting to the national RTK system, we can localize the truck's exec locaton on the map toa few centimeters of accuracy, These two systems (perception software and map localization) workin concert with each other and are not mutually excusive. Before testing begins on any alven day and route, the co-criver wil fist evaluate the curren road concltions and system readiness before caling for an “engage.” The diver must‘then manually press @ button on the dashboard to engage the sel-driving system, While he selPctving system is engaged, the diver must be extremely atlentve and ready to take back “ful control wnenever necessary. In stuations where the diver decides to take back fll centro, the logging software inthe sell-dring system remains active and records the corrective ection the diver took, Late, our engineers review these logs to obtain an accurate representaticn of ‘what a human truck driver would do in that particular scenario. A the moment when the driver is taking over full manual conto, the co-criver wil leave detailed notes describing what was happening onthe road atthe time, as well as the reason for the particular cisengage. Dashboard and Log Analysis ‘Once a road-testis complete, we present the collected information to aur engineers via. ‘web deshboard that enables them to see the location of disengages, the comments left by the ‘codrver, and the exact og fie that they can download to review and test new sofware Upgraces against (see screenshot below, which displays two weeks! worth of testing). Disengages are vital to evaluating the performance of our system. Each disengage is reviewed closely after the fact to determine the root cause of the shortcoming and to assign @ ‘covey level to each event. All disengagne ara important and taken seriously. but the various severity levels serve as a signal o our sofware engineers as to which bugs to proniz.Disengage Severt “The folowing categories are assigned to disengagement events after the logs have been careful analyzed ‘Comfort - a case where drivers fet uncomfortable withthe decision the self-diving system ‘made but in realty, pose a near-zor0 safely threat to themselves or anyone else on the road “This category exists because we train our drivers to be extremely cautious and defensive. ‘Public Perception - a case where the behavior ofthe tuck would be considered odd from an ‘onlookers perspective. This sa real sofware bug, but has near-zero safety implications, An ‘example might be minimal weaving within the lane while raveling in slow-moving traffic. (MBJBR- a scenario that could have had safety implications had there not been driver intervention. An example would be braking onthe freeway when there was no reason to do so. “Major scenarios are ones that could have resuted ina safety risk there had been a different set of actors surrounding the truck; but they would not have actually resulted in acolision or ‘near-collsion even ifthe human drver had not taken back fll control during the partcuar ‘envionment of road conditions and surrounding vehicles when the disengage occured. RIB - 2 sconario where the truck’ set-in actions put itor any surounding actors in actual danger that equi the driver to take back ful control at that te. An example coud be 2 stuaton where the criver takes over because the system didnot command enough raking in torder to stp in ime for alad vice Triage Once a disengage event has a severity level assessed, itis assigned toa tracking bug, ‘These bugs capture specific behavioral responses from the tuck without dling dow tothe ‘exact line of code where something went wrong. While deep analysis is undertaken by the ‘responsible engineer, the tracking bugs serve asa marker for us to monitor general system perfermanes through ach daration of tha software. At ito, we refer to his process of post ‘analyzing the loge a8 “age” to highlight our need te have quick analysis and to shine a spotight onthe most important issues, During this process, futher triage notes are lft for each ‘event highlighting exactly what occurred onthe road, what the systems response was, and wat was the root cause of the system’ response. \When engineers and software quality analysts review logs they doo using our playback. system nicknamed "DogTV", which isthe same visualization tool that co-drvers use for active ‘monitoring while onthe road (see screenshot below).In DogTV, we can see the video captured during testing overiad with our perception system output. The perception eystom takes in camera images, radar measurements, a8 wal a8 lidar returns to see the lane and al objects around the truck. For example, inthe image above, We are outputting the eystem’s interpretation ofthe left lane (red), right lane (green), centerine (white), and surrounding vehicles (yellow boxes). The camera view i the most interpretable ‘tom an outsiders perspective, but there are many more visualization options. Within DogTV we have a 3D visualization that shows raw perception and planner output (seen to the right of he ‘camera image above), In this view, engineers can diagnose exactly what the system "saw" 2s well as what planned todo with that information in terms of trajectory braking response, ek. “The advantage to this system is that our engineers have the ability to test new code 2n the exact events we struggled wth inthe past. Sofware simulation of tis nature is @ crucial ‘component fo ensure truck safety onthe road. I gives us the ablty to recreate previous drives and test now they would play out without having to put a truck back into the same situation, For instance, i we momentarily lose tracking ot lane when we go under a onage, we can repley that same issue with diferent versions of software and see which one fixes the issue. The same process can be applied to not only what we see, but also how we react. The planner team can {90 back to events where divers reported we braked too much orto litle to ensure the fut ‘commanded braking wil space the truck appropriately‘As much as we woul ike fo write every line of code perfectly onthe frst ry, bugs do _arige and we are continually updating our software. In order for an engineer to submit @ code change they first must pass a regression test. This test ensures that tre change wil improve the softwere and not make i worse than itwas before. An example casei esting that our lane percestion improves or remains the same as it was before, Ths s accomplished by comparing the system output ta images where the lane lines have been laboled by humans. These images ‘are re'erred to as ground truth, the change does not increase the deviation from the ground truth lane ins tis one step closer to being accepted. The final step in submitting a change is to have he code reviewed by another engineer that aur that section ofthe cade. Once the code passes the review itis submitted tothe master version of our system Even though the new code has gone through thorough testing we have another stop before tis released to aur entre let. We bundle many code changes into “eleases', which are ‘comparable to software versions of other products. Once a release is bundled up we have one final tet before the software is pushed fo all the trucks. The sofware s pushed to one truck ang itis sent out on the road todo an inital test run. During this run, the diver and co-drver are extremely cautious and send 2 report tothe entre software team after the run is complete Ifthe testing goes wel, the release Is accepted and pushed tothe entire ef the testing does not {90 well, the code is re-evaluated and the whole process starts over Colorado Preparations Bore road testing begun in Colorado, members of our operations team analyzed over {80 possible routes around the country using satelite imagery to determine the best option for ‘our 100-mile demonstration. Once we determined thatthe route met cur system specifications, ‘an inal ruck was sent out in arly August o confirm our findings. We decided that we were ready to proceed and immediately added the Colorado logging data into our machine learning pinelne to tune for the diferences between Colorado highways and thse in Calforia. With this new information another truck was sent out late August, along with our hea of software and our lead perception engineer. They validated thatthe 25 route between Fort Coins and Colorado ‘Springs would work very wel fr this demonstration, During this rip our tucks collected the data necessary for us to make a detalied map ofthe route without having b engage the autonomous system. In the meantime, we have been performing hardware upgrades that we will use forthe ‘demanstraton, These upgrades include a new computer, new LIDAR sensors, and a more rabust wiring harness. We have already tested these upgredes on California roads for several ‘weeks as wel 2s at a privat testing facity called GoMentum Station We have been doing ‘ress testing with afl loaded tale similar tothe one we wil be using forthe actual commercial delivery in Colorado,[As of now, we have @ tuck in Colorado testing extensively. A driver and co-diver take the day shit, and they ae replaced by anotner arver ana co-drver forte right shi. These 2-shift rotations will occur every day and night uni the demonstration in October. For this demo, ‘we wil be runring at night, but the day testing gives us more chances tose trafic incase there is traf the night of our delivery. n addon to our usual testing wth the aperations team, crew engineers ftom both the Control team and the Perewplon lew Sa Fancisco are fying out to ‘work shits as 20-divers. Citeria for Executing the Self-Driving Delivery [As we prepare forthe fst ever self-dving commercial delivery, one key metic we ‘monitors MPI, miles per intervention. This stats, montored by our safety engineers, Ure ‘numberof miles our truck travels autonomously on 8 given route before the driver intervenes for ‘any teason, The requirement to ensure safety is that we must be at 5x the MPI necessary to ‘complete the demonstration. So, for our Fort Colins to Colorado Springs route of 125 mils, We will not do the demonstration unless we achieve a MPI of 625 miles. This means no dsengages: cof any kind (Como, Public Perception, Major, or Crticl) for 625 miles or, equivalent, 5 consecutive runs of the serving delvery route without a sisengage. ‘Safety Precautions During the Demonstration ‘To give an extra layer of safety on behalf ofthe national shipper we are working with, we wil have chase cars and ead cars driven by Otto employees near the truck, monitoring for any ‘Unusual oad condition. This coud include things ike a broken bumper inthe middle ofthe road, ‘8 CSP office pulled over tothe side, or reckless or drunk diver. {Allo us at Otto want to thank you or working with us on this project. We are very ‘excited forthe next couple weeks and hops thie demonstration wll lp you understana the full capabltes of our autonomous system.Information for New Entrant Safety Audit Document Submission ‘The below applicable documents must be submited via the Safety Audit Website ntpi//aifmesa dot gov/newertrant no later than 20 calendar days from the date of ths letter. ‘The website listed above provides mare information for each document requested including information regarding the type ofcarier operations for whom the document applies. All carriers must submit the following: plod = IBRBEEAIE woring forte cartier incuing stand at names, dat of sith dats of ire, fens numbers andIcense sates > Untod RESIS owned by the caer along wit sscted uit umes, is and lates. ‘All carvers must have insurance suficiet to satisfy the minimum publi la requirements. ‘+ If your carrier operation Is for-hie (paid for transport) upload a copy of aigmied (e336; on + If your eariar apa-ation ie peivate (not pald for transport that hauls hazardous material, upload acopy of a signed MCS-90. ty Applicable Regulation: A9CFR 387.1 387.17 ‘The following do not need to be submitted by agricultura/farming related operations that stay within 150 air miles oftheir place of business. All other operators are expected to submit the following: apes ‘CMV drivers ae requred to undergo a physical every two years by a registered Medical Examiner in order to ensure they are physically and mentally fit to drive, The eel se aacenneuses Applicable Regulation: 49CFR 391.51 ~391.55 Carriers must maintan a motor vehicle record for each driver's Driver Qualification file, Upload copy

Otto Testing Operations

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Otto Testing Operations

Uploaded by

Copyright:

Available Formats

You might also like