You are on page 1of 15

Using Support Vector Machines to Classify Traffic Signs for an Autonomous Vehicle

Kevin Chang Department of Computer Science The University of Texas at Austin Austin, TX 78712 USA kchang@cs.utexas.edu

Todd Swinson Department of Computer Science The University of Texas at Austin Austin, TX 78712 USA swinson@gmail.com

CS 378: Autonomous Vehicles in Traffic Advisor: Dr. Michael Quinlan May 6, 2011

Abstract This paper describes a method of implementing a traffic sign classifier using machine learning techniques on an autonomous vehicle. Currently, the vehicle relies on a data file that specifies the location of Stop and Yield signs. However, it would be impractical to record the locations of all traffic signs in order for the vehicle to behave autonomously. We attempt to solve this problem by extending the current sign detection code, which combines LIDAR data and camera images to produce a cropped image of a possible traffic sign. We then use a Histogram of Oriented Gradients feature extractor on several of these cropped images to train a Support Vector Machine. Our preliminary results show that our classification framework can quickly and accurately classify an image into one of seven classifications.

1 Introduction
Austin Robot Technology and The University of Texas at Austin created an autonomous vehicle in order to compete in the DARPA Grand Challenge (2005) and Urban Challenge (2007) [1]. The purpose of these challenges was to motivate research in designing and building vehicles that were able to perform various tasks without human intervention. A variety of algorithms were implemented, including path planning, lane detection, and interacting with other vehicles at intersections [1]. However, algorithms for detecting and classifying traffic signs, as well as determining what the vehicle should do given this information, without an RNDF were not fully implemented for the DARPA challenges. Since that time, additional work has been performed in detecting traffic signs. We have created a sign classification algorithm that uses machine learning that can determine the type of sign in an image. This algorithm can be integrated into the existing sign detection framework and used to aid the vehicle in determining its behavior based on its surroundings. This paper details a description of the problem we wish to solve, the design of our solution, our results, and possible future work.

2 Problem Description
Currently, the autonomous vehicle relies on an RNDF to determine where certain street signs, like Stop signs, are located. If the vehicle were placed in an area that was not defined by an RNDF and approached an intersection with a Stop sign, it would fail to stop because it would not know of the sign's existence. For a vehicle to act autonomously, it will obviously need the ability to detect and classify street signs. Once it is able to gather this information, the vehicle will then be able to determine what action to take without prior knowledge of its environment. We designed and implemented a sign classifier using machine learning to reduce the vehicle's dependence on an RNDF, by adding the capability to classify street signs. We utilized existing components, as well as a Velodyne LIDAR and cameras mounted on the vehicle to accomplish this task. We believe that our approach of using a Support Vector Machine (SVM) is a

good approach and accurately determines whether a Stop or Yield sign is present in an image.

3 Background Information
Existing components in the vehicle's code base are utilized in our project. Our classifier will become part of a larger framework for detecting road signs and will allow the vehicle to act once a positive identification is made.

Figure 1: Detection and classification components [1] We modified the existing sign detection components, which combine the LIDAR and camera sensor information to create a cropped image when the LIDAR suspects that a sign is in the vicinity of the vehicle. The component detect_jesse examines the Velodyne's point cloud and determines which area of the cloud is likely to contain a sign, while point_to_image uses the output of detect_jesse, as well as the video feed from the vehicle's cameras to output a cropped image of a suspected sign to the ROS topic image_sign. The topic is updated each time a suspected sign is detected, potentially several times per second. Our classification algorithm uses an image from the image_sign topic as input and outputs the sign type (Stop, Yield, none, etc) that is contained in the image. Our classifier relies on OpenCV's SVM and feature extractor functionality. We chose to use the Histogram of Oriented Gradients (HOG) feature extractor because it has been shown to provide better results than other feature extractors [2]. HOG is able to describe an image based on the distribution of local intensity gradients or edge directions [2]. This information is then combined and a vector is created that describes the image.

Figure 2: Examples of cropped images from image_sign Before the classifier is ready to classify images, its underlying SVM must first be trained. The SVM is trained by feeding it feature vectors that were extracted from several images. Each feature vector is associated with a particular classification (in our case, Stop signs, Yield signs, etc). In the SVM, groupings of vectors of the same class are separated from groupings of other classes by a hyperplane [3]. During classification, HOG is used to extract a single feature vector from an image likely containing a sign. The vector is then input to the SVM's predict function. The SVM places this vector within its space of existing vectors, and the class that it falls into (separated by hyperplanes) is output as the classification of the input image.

4 Design
4.1 Training and Testing Data In order to train our classifier, we had to provide it with several sample images. To obtain the images, we drove the car around the Pickle Research Campus, passing by several different Stop and Yield signs. The vehicle saved the LIDAR and video data to bag files. We ran these bag files through a tool we created, ml_capture. This tool, utilizing the existing sign detection components, saves the cropped images of suspected signs to the file system. We then manually pored over the images, separating them based on their actual sign type (we also separated the images that did not contain a Stop or Yield sign at all), and moved each image into the directory structure described

below. The directory structure used by our trainer must conform to a certain specification. The training directory given to the trainer must contain sub-directories, where each sub-directory is a class. Each sub-directory's name must be the class number of the images contained inside. We devised several different classes, outlined below. Figure 3: Directory structure for sample images Class 0 Description Not a valid sign. Could contain trees, cars, or anything that is clearly not a Stop or Yield sign. Could also contain Stop and Yield signs that are too small to be considered a prominent object. Full Stop signs. Images of Stop signs that are tightly cropped, where approximately 90% of the sign is showing. Partially obstructed Stop signs. Contains Stop signs that are only partially visible in the frame due to incorrect cropping or obstruction by other objects. Less than 90% of the sign is showing. Far away Stop signs. Images of Stop signs that occupy a very small portion of the frame, approximately 25% or less. Full Yield signs. Images of Yield signs that are tightly cropped, where approximately 90% of the sign is showing. Partially obstructed Yield signs. Contains Yield signs that are only partially visible in the frame due to incorrect cropping or obstruction by other objects. Less than 90% of the sign is showing. Far away Yield signs. Images of Yield signs that occupy a very small portion of the frame, approximately 25% or less.

1 2

3 4 5

Table 1: Description of sign classes 4.2 Overview of the Code While designing our code, we realized that adding other image feature extractors would be important so that different image attributes could be used to train an SVM. We therefore decided to create a common interface that all extractors must implement. We hope that this will allow the efficient and correct integration of other extractors into our existing SVM framework. We also developed a variety of tools that allow users to easily capture images, create an SVM, and test an

existing SVM. 4.2.1 extractor.h / extractor.cpp These files define the common interface that all extractors must implement. Any extractor that inherits from this interface will be able to use the tools we developed to create and test an SVM. extractor.cpp defines two methods that are common amongst all extractors. The first method (load_images) parses through the directory structure of images and loads the names of all the files contained in each sub-directory into an OpenCV matrix. All the names in a sub-directory are placed in the corresponding row of the matrix. The second method (train_SVM) is modified code from [4]. It can train an SVM by either automatically determining the optimal parameters or using pre-defined parameters. However, we were unable to train using the automatically generated parameters due to an exception within the SVM code. Instead, we used the manually defined parameters provided in [4], which seemed to give accurate results. The method also saves the SVM to a file, allowing a user to quickly load an SVM to test or run on the vehicle. These files also declare method headers that any new extractor must implement (see section 4.2.2 for pseudocode). The extract method should extract the features from a single image using the particular extractor's algorithm. The extract_features method should extract the features from a set of images by calling extract on each image. The classify method should extract the features from a single image and return the SVM's prediction of the sign's classification. The train method should load the image names from the directory of training images, extract the features using extract_features, and then train the SVM using those features by calling train_SVM. The
test

method should load the image names from the directory of training images, extract the

features using extract_features, and then ask the SVM to predict the sign's type by calling
classify.

4.2.2 hog_extractor.h / hog_extractor.cpp These files define a Histogram of Oriented Gradients (HOG) extractor; the class inherits from extractor. It uses OpenCV's HOGDescriptor to compute the feature vector from an image.

When extracting from an image, we initially resized the image to 50x50 and then computed a feature vector using a stride of size 10x10 and padding of size 50x50. This produced a feature vector of 102,060 attributes. However, when we saved the SVM after training it with fewer than 30 images, the file size was over 300 MB. We then decided to decrease the number of attributes by using the same parameters as above but changing the stride to 50x50. This produced a feature vector of size 7560, resulting in a 200 MB file when trained with about 2800 images. The following is brief pseudocode for each of the major components of hog_extractor.
Matextract(MatIMAGE): if(IMAGEnotempty): resizeIMAGEto50x50; FEATURES:=compute(IMAGE,Size(50,50),Size(50,50)); returnFEATURES; fi returnemptyMat;

Code 1: Extracts the features of an image

voidextract_features(stringPATH,vector<string>CLASSES, vector<vector<string>>NAMES): foreachCinCLASSES: foreachNinNAMES: FULL_PATH:=PATH+CLASSES[C]+/+NAMES[C][N]; IMAGE:=imread(FULL_PATH); FEATURES:=extract(IMAGE); associateFEATURESwithCLASSES[C]andstore; end end

Code 2: Extracts the features from a set of images and stores them

intclassify(MatIMAGE): ifSVMisnotinitialized: exit; fi FEATURES:=extract(IMAGE); TYPE:=SVM>predict(FEATURES); returnTYPE

Code 3: Extracts the features from an image and classifies it using an existing SVM

voidtrain(stringTRAIN_PATH,stringSAVE_PATH): CLASSES,NAMES:=load_images(TRAIN_PATH); extract_features(TRAIN_PATH,CLASSES,NAMES); train_SVM(SAVE_PATH);

Code 4: Loads a set of images, extracts their features, and trains an SVM

voidtest(stringTEST_PATH): CLASSES,NAMES:=load_images(TEST_PATH); foreachCinCLASSES: foreachNinNAMES: FULL_PATH:=TEST_PATH+CLASSES[C]+/+NAMES[C][N]; IMAGE:=imread(FULL_PATH); TYPE:=classify(IMAGE); ifTYPE==CLASSES[C]: predictioniscorrect; else: predictionisincorrect; fi end end

Code 5: Loads a set of images and classifies them using an existing SVM 4.2.3 ml_capture.cpp This file defines a node that subscribes to the image_sign topic, which publishes cropped images of suspected signs. These cropped images are saved to a directory where the user can manually sort them into sub-directories according to how the user wishes to classify them. We created this tool to allow a user to easily capture training and testing images directly from bag files. By capturing sample images in this way, the images are guaranteed to have been processed in the same way before being classified by ml_classify. This reduces any variations between the training/testing images and the actual images used by the vehicle when classifying signs. When saving the images, we needed to guarantee unique file names to prevent overwriting previously saved images. To accomplish this, we use the following algorithm:
PATH:=pathtosaveimages; ID:=currenttimeinseconds; foreachimageIreceived: FULL_PATH:=PATH+ID+.jpg; saveIatFULL_PATH; ID++; end

Code 6: Algorithm to guarantee unique image file names 4.2.4 ml_classify.cpp Currently, this file defines a node that subscribes to the image_sign topic and loads a predefined SVM. Using this SVM, it classifies each cropped image it receives and places it in a sub-

directory corresponding to its classification. This allows the user to quickly see how each cropped image was classified. In the future, we would like this node to publish the image's classification and location on a topic instead of saving the classified images to a directory (see section 7). 4.2.5 ml_run.cpp This file defines a user interface that allows the creation and testing of an SVM. The user can select one of four options: Option 1 2 3 4 Description Train an SVM on all the images in a directory and save it to a file Load an SVM from a file and test it on all the images in a directory Train and test an SVM on the images in a directory (the user can specify the number of images to use) Automatically train and test an SVM on the images in a directory using a varying number of training/testing images

Table 2: User interface options For each option, the user can select the type of feature extractor to use. Currently, the only option is the HOG extractor. However, new extractors can be added as long as they implement the interface in extractor.h. Various statistics are displayed when testing (option 3 or 4) is complete. The statistics use the following format:
AccuracyTimeToPredictFalseNegativeFalsePositive0[...]FalsePositiveN

Table 3: Testing output format Accuracy is the number of correct classifications divided by the total number of test images. Time to Predict is the number of milliseconds to classify all the test images; the user can divide this value by the number of test images to compute the average time to classify a single image. False Negative is the number of occurrences where the SVM predicted the image did not contain a sign when, in actuality, it did. False Positive 0 is the number of occurrences where the SVM predicted the image contained a sign of type 0 when, in actuality, there was no sign. There will be N of these values, where N is the number of classification types.

5 Testing Methodology
We used our ml_run tool to carry out statistical and performance tests on our SVM and HOG implementations. We wanted to determine how the number of training images and the number of attributes extracted from the images would affect the accuracy of the classifier and the time it took to classify a set of images. We defined the sample sizes to be the number of training and testing images. We also defined the number of testing images to be 10% of the number of training images. Because the total number of sample images was 2794 and we wanted the number of training images to double for each sample size, the following sample sizes were used: Training Images 78 156 312 624 1248 2496 Table 4: Sample sizes For each sample size, we ran 40 trials. For each trial, we randomly selected training and testing images from the directory of sample images. We also ensured that the set of training and testing images were disjoint. For each trial, we gathered the statistics in Table 3: Testing output format. We performed these tests using two different feature vector sizes. The first set of tests used a feature vector size of 7560 and the second set used 102,060 features. Testing Images 7 15 31 62 124 249 Total Images 85 171 343 686 1372 2745

6 Results
Our results show that as the number of images used to train an SVM increases, its accuracy and time to classify a single image increases. As the number of training samples increases, the rate of change in the accuracy decreases. The graph below suggests that at a certain training sample size, we will not see a significant improvement in accuracy. However, because we do not have enough sample data, we cannot determine the optimal number of training images for the SVM. When a larger vector of features is used to train an SVM, the accuracy does not show a noticeable

improvement. Therefore we believe that using a large vector of features does not provide a significant benefit while the costs of generating and storing the vectors is high.

Accuracy
100 95 90 85 80 75 70 65 60 0 500 1000 1500 2000 2500

Accuracy (%)

Training sample size


7560 Features 102060 Features

Figure 4: Average accuracy per training sample size Our testing methodology may have led to artificially high accuracy. Because our training and test images came from the same sequence of images, we believe there is a high probability that a given test image is very similar to a training image. Thus, this testing approach does not accurately simulate a real-world application of our classifier on the vehicle. A more realistic test would be one in which we test using a bag file that was not used to train the SVM. When the number of features used to train increased from 7560 to 102,060, the time to classify a single image increased dramatically. Given that the accuracy of the classifications between the two feature vector sizes are similar, our data shows that using a larger number of attributes is not optimal. Being able to quickly classify an image is necessary because other components of the vehicle depend on the classifier to produce a prediction in real-time.

Time to classify one image


140 120 100

Time (ms)

80 60 40 20 0 0 50 100 150 200 250

Training sample size


7560 Features 102060 Features

Figure 5: Time to classify per training sample size We did not see a significant difference in the number of false negatives and false positives between the two feature vector sizes. This further supports our belief that training the SVM with a large number of attributes is not necessary. We believe that there are more false positives for Stop sign classes (class 1, 2, and 3) than for Yield sign classes (class 4, 5, and 6) because there were significantly more samples of Stop signs than Yield signs. Additionally, the way we grouped sample images into classes may have affected the false positive rate because it was a subjective process. We used our best judgment on images that met the criteria for multiple classes when determining their class. Therefore, the SVM may have had difficulty when classifying images that were near the boundaries of each class type.

False Negatives
7 6 5

False Positives
2.5

Average

1.5 4 3 2 1 0 7560 Features 102060 Features 0 1 2 3 4 5 6

Average

0.5

Figure 6: Average false negatives per feature vector length

Sign class
7560 Features 102060 Features

Figure 7: Average false positives per feature vector length

7 Future Work
There are several features we would like to improve upon in the future. These include integrating the classifier with the vehicle, improving the testing methodology, and implementing additional feature extractors. In order for the vehicle to act upon the sign classification, the ml_classify node will need to publish a message that contains the sign class and location. To accomplish this, we will need to incorporate additional LIDAR data to determine the sign's location relative to the vehicle. We will then need to design a message format to represent the classification and location of a sign. We can then publish this message to a ROS topic that can be read by other components of the vehicle. We would like to gather additional sample images in order to determine the optimal number of training images. We would also like to run our tests with several varying feature vector sizes to determine the optimal vector size for the HOG extractor. As mentioned earlier, we believe a more realistic test would be to test with bag files that were not used to train. This would allow us to more accurately measure the classifier's performance. We designed our classification framework to implement and easily integrate additional

feature extractors. We would like to create SIFT and SURF extractors and evaluate their accuracy. This will allow us to compare various extraction algorithms to determine their strengths and weaknesses. From this, we would like to run ml_classify using multiple SVMs simultaneously, each trained using a different extractor. This may allow us to increase the overall accuracy of the classifier by leveraging the strengths of a particular extraction algorithm over another.

8 Conclusion
Our goal was to reduce the autonomous vehicle's dependence on an RNDF by adding the ability to determine where traffic signs are located. We have achieved a portion of this goal by using machine learning techniques to dynamically classify traffic signs. We designed a framework that uses an SVM to classify images that may contain traffic signs. Our preliminary tests show promising results. We achieved a fairly high rate of accuracy but would like to perform additional, more realistic, tests. The final step of removing the dependence on an RNDF will be fully integrating the classifier with the vehicle.

Appendix
How To Run Our code is located at sandbox/stacks/art_experimental/ml_sign ml_capture.cpp Run each of the following in its own terminal window:
>roscore >rosrunvelodyne_signsdetect_jessecompressed >rosrunvelodyne_signspoint_to_imagecompressed >rosrunml_signml_captureraw<pathtosaveimages> >rosbagplay<bagfile>

ml_run.cpp From sandbox/stacks/art_experimental/ml_sign:


>bin/ml_run

See section 4.2.5 for further instructions. ml_classify.cpp Run each of the following in its own terminal window:
>roscore >rosrunvelodyne_signsdetect_jessecompressed >rosrunvelodyne_signspoint_to_imagecompressed >rosrunml_signml_classifyraw >rosbagplay<bagfile>

What We Learned Kevin: During this project, I gained experience with C++ and SVN. I also learned a great deal about how SVMs work and some basic image processing. However, I think the most beneficial aspect of this project was giving me additional experience in Computer Science research. I was able to work with a partner to define a particular problem, design and implement a solution, and finally test our solution to determine if it solves the problem. Todd: I gained experience working with a partner and defining concrete goals for an open ended project. I got an introduction to Robotics and how Computer Science research is conducted. Throughout the project, I improved my C++ programming skills and gained exposure to the SVN source control system, as well as the OpenCV library. I learned about machine learning and image processing techniques, specifically support vector machines and feature descriptors. Finally, this project forced me to think about how to create rigorous testing methodologies.

References
[1] Beeson, P., O'Quin, J., Gillan, B., Nimmagadda, T., Ristroph, M., Li, D., Stone, P., Multiagent Interactions in Urban Driving, Journal of Physical Agents: Multi-Robot Systems, vol. 2, no. 1, 2008. [2] Dalal, N. and Triggs, B., Histograms of Oriented Gradients for Human Detection, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 886 893, 2005. [3] Wikipedia contributors, Support vector machine, Wikipedia, The Free Encyclopedia, <http://en.wikipedia.org/wiki/Support_vector_machine>, 5 May. 2011. [4] Breckon, T., 9 Feb. 2011, Support Vector Machine (SVM) learning, ver. 0.2, [C++ Program], <http://public.cranfield.ac.uk/c5354/teaching/ml/examples/c++/speech_ex/svm.cpp>, 29 Mar. 2011.

You might also like