Human Factors - The Journal of The Human Factors and Ergonomics Society-2003-Carnahan-408-23

Comparing Statistical and Machine Learning Classifiers:
Alternatives for Predictive Modeling in Human Factors

Research
Brian Carnahan, Auburn University, Auburn, Alabama, Grard Meyer, Carnegie Mellon
Driver Training and Safety Institute, Lemont Furnace, Pennsylvania, and Lois-Ann Kuntz,
University of Maine at Machias, Maine
Multivariate classification models play an increasingly important role in human fac-

tors research. In the past, these models have been based primarily on discriminant
analysis and logistic regression. Models developed from machine learning research
offer the human factors professional a viable alternative to these traditional sta-
tistical classification methods. To illustrate this point, two machine learning
approaches genetic programming and decision tree induction were used to
construct classification models designed to predict whether or not a student truck
driver would pass his or her commercial driver license (CDL) examination. The
models were developed and validated using the curriculum scores and CDL exam
performances of 37 student truck drivers who had completed a 320-hr driver
training course. Results indicated that the machine learning classification models
were superior to discriminant analysis and logistic regression in terms of predictive
accuracy. Actual or potential applications of this research include the creation of
models that more accurately predict human performance outcomes.
INTRODUCTION carpal tunnel syndrome status (Babski-Reeves

& Crumpton-Young, 2002; Matias, Salvendy, &
Background Kuczez, 1998), jobs that pose high risk for
As defined by Clancy (1997), classification work-related low-back disorders (Marras et al.,
analysis seeks to identify mathematical and/or 1993), lumbar discomfort while sitting (Vergara
statistical relationships between independent & Page, 2002), and the risk of slipping and
variables (discrete or continuous) that can effec- falling while walking on a sloping surface
tively distinguish or characterize various levels (Hanson, Redfern, & Mazumdar, 1999). The
within a nominal dependent variable (categorical field of transportation safety research has used
variable). Once identified, these relationships can these methods to identify factors that contribute
be used for descriptive assessment or predictive to airline pilot error incidents (McFadden, 1997),
modeling. Researchers in the areas of human red-light-running behavior in urban settings
factors, ergonomics, safety, and psychology have (Porter & England, 2000), driver drowsiness in
for some time used multivariate classification virtual driving tasks (Verwey & Zaidel, 2000),
analysis to expand the body of knowledge in and injury severity in motor vehicle accidents
their respective fields. Traditionally, discriminant (Al-Ghamdi, 2002). In psychology and human
analysis and logistic regression have been the factors research, these same methods have been
most commonly used statistical methods for used to predict success or failure in activities
carrying out classification in these fields. such as lifting and carrying criterion tasks (Ray-
In industrial ergonomics, for example, these son, Holliman, & Belyavin, 2000), air force
methods have been used to predict or identify basic training (Lubin, Fielder, & Van Whitlock,
Address correspondence to Brian Carnahan, Industrial and Systems Engineering, Auburn University, 207 Dunstan Hall,
Auburn, AL 36849; carnahan@eng.auburn.edu. HUMAN FACTORS, Vol. 45, No. 3, Fall 2003, pp. 408423. Copyright
2003, Human Factors and Ergonomics Society. All rights reserved.
Downloaded from hfs.sagepub.com at DEFENCE INSTITUTE OF PSYCHOG. on August 23, 2016
MACHINE LEARNING IN HUMAN FACTORS RESEARCH 409
1999), and nuclear reactor operation (Mackieh that are homogeneous in terms of the outcome
& Cilingir, 1998). variable. Each of these subsets represents a sin-
Although their use is widespread in the hu- gle path or branch along the decision tree. Each
man performance field, discriminant analysis path represents a specific pattern of descriptive
and logistic regression do have their limitations. variable values common to a number of cases
Principal among these is their dependence on a that possess the same outcome. Once construct-
fixed, underlying model or functional form. ed, the tree can classify future cases in which
Discriminant analysis uses linear summation of the outcome is not known, based on the values
independent variables to differentiate one cate- of its descriptive variables. Decision tree in-
gory from another (Huberty & Lowman, 1997). duction has been used in the past to identify
Logistic regression also makes use of linear common patterns associated with specific auto-
summations of independent variables, incorpo- motive accident outcomes (Clarke, Forsyth, &
rated into a logistic function (Myers, 1990). Wright, 1998b; Sohn & Shin, 2001).
Koza (1992) made the observation that both Genetic programming (Koza, 1992, 1994; Ko-
techniques use regression merely to discover za, Bennett, Andre, & Keane, 1999) is a search
numerical coefficients for predetermined models. method based on the Darwinian principles of
In human factors research, however, these para- natural selection and survival of the fittest. This
metric models may not be the most appropriate approach uses fitness-based selection and solu-
ones for classification tasks involving some tion recombination to produce a population of in-
human performance activities, especially those creasingly effective computer programs designed
for which outcomes cannot be differentiated or to solve a particular problem. This technique is
predicted based on a simple linear summation one form of heuristic search algorithms found
of independent variables or for which functional in a field of computer science research known
form cannot be established a priori. Such circum- as evolutionary computation (Sebald & Fogel,
stances would necessitate the use of alternative 1994). By relying on an evolving population of
classification modeling approaches. programs driven by natural selection, genetic
programming uses multiple points that can
Machine Learning Alternatives for climb in different directions in parallel as well as
Classification Model Development jump to different locations within the solu-
Beyond the plethora of traditional statistical tion space. For classification modeling, genetic
classification techniques available, of which programming can be used to solve problems in
discriminant analysis and logistic regression symbolic regression. As defined by Koza (1992),
are but two, machine learning research offers the symbolic regression entails discovering a math-
human factors professional a viable alternative ematical expression that provides the best fit
for classification model development. Decision between independent and corresponding depen-
tree induction and genetic programming are dent variables within a finite data sample.
two of these machine learning approaches. Kishore, Patnaik, Mani, and Agrawal (2000)
Unlike the traditional statistical methods previ- identified two distinct advantages that genetic
ously discussed, these methods do not rely on programming would have over traditional sta-
predetermined models using linear summa- tistical methods for classification. First, genetic
tions of independent variables. programming does not require advanced knowl-
In decision tree induction (Quinlan, 1986), edge or assumptions concerning the statistical
the decision tree is constructed from a data set distribution of the data. Second, genetic pro-
composed of a series of cases. Each case is com- gramming does not use any specific predeter-
posed of a set of values (discrete or continu- mined model but can detect an underlying
ous) associated with a fixed set of descriptive relationship between independent and depen-
independent variables and a corresponding dent variables and express that relationship
outcome-dependent variable (i.e., category or through a mathematical model constructed to
class) associated with those values. Based on fit the data.
the specific values of these descriptive vari- Evolutionary computation techniques, such as
ables, the tree partitions the cases into subsets genetic programming, have been used to discover
410 Fall 2003 Human Factors
solutions to various ergonomics design problems Data Collection Procedure and

involving control panels (Pham & Onder, 1992), Classification Task
lifting tasks (Carnahan & Redfern, 1998a), job
rotation schedules (Carnahan, Redfern, & Nor- Performance data from each of the 37 par-
man, 2000), and assembly line balances (Carna- ticipants were based on their completion on an
han, Norman, & Redfern, 2001). In terms of 8-week, 320-hr course designed to teach novice
classification, transportation safety researchers drivers basic truck driving skills with emphasis
have made use of these techniques to discover in the areas of safety and accident prevention.
rules that could classify and predict the outcome The curriculum was composed of six major
of automotive accidents (Clarke, Forsythe, & components: classroom instruction (Byrnes &
Wright, 1998a). In addition, Carnahan and Red- Fox, 1999), range driving, road driving, simula-
fern (1998b) have used genetic programming tion training, controlling a truck through a skid
to develop models that accurately classify lifting or slide, and physical health/wellness training.
tasks as posing high or low risk of occupational By completing the curriculum, each participant
back injury. accumulated 10 scores that reflected his or her
performance in the truck driver training course.
Project Purpose A description of these scores is shown in Ta-
The purpose of the current project was to con- ble 1.
duct a methodological pilot study that compared Referring to Table 1, Curriculum Areas 1
traditional statistical classification methods through 5 (basic operations, driving techniques,
(discriminant analysis and logistic regression) vehicle systems, operation skills, and profes-
with methods based on machine learning (deci- sional driver) covered textbook knowledge
sion tree induction and genetic programming) commonly found in commercial driver training
in an area relevant to human factors/human courses. Scores for these five areas were based
performance research. The basis for comparison on multiple-choice tests of textbook chapter
will be data associated with commercial-driver materials (Byrnes & Fox, 1999). Curriculum
training and subsequent examination. The hypo- Area 6, driver development, covered informa-
thesis being tested was that the machine learning tion concerning a personal health program
classification models could yield performances designed to teach ways of improving and main-
superior to those of discriminant analysis or taining health and physical fitness. Scores for
logistic regression when called upon to accu- this area were based on a multiple-choice exam
rately predict human performance outcome. covering topics related to physical wellness
and lifetime management. Curriculum Areas 7
METHOD through 9 (skill development, range skills, pre-
trip inspections, and road skills) covered those
Participants driving skills that are relevant to the commer-
Data were collected from 37 trainees (36 men, cial drivers license (CDL) examination. Scores
1 woman) who enrolled in a novice truck driv- in these curriculum areas were based on exam-
er training course offered by the Carnegie inations that used the same range tests, road-
Mellon Driver Training and Safety Institute. driving tests, and guidelines as those used by
The trainees ages ranged from 18 to 58 years actual CDL examiners. Each score was divided
(average = 38.0; standard deviation = 10.1). by its corresponding maximum value (the max-
All trainees had previously received a high imum score possible in the curriculum area),
school or general equivalency diploma. Ethnic thereby normalizing the scores for all 10 vari-
composition was not diverse (all participants ables. These 10 normalized scores were selected
were Caucasian). All trainees had passed drug to represent the set of dependent variables for
toxicology and medical screenings as required the classification task.
by the U.S. Department of Transportation, Once training was completed, each trainee
Federal Highway Administration. All trainees took the State of Pennsylvanias CDL examina-
received full tuition assistance from state fund- tion at one of six nearby testing locations. This
ing sources. examination consisted of a trained evaluator
TABLE 1: Description of Curriculum Areas, Content, and Evaluation Score Used in Driving Training Course
Curriculum Area Curriculum Content Basis for Evaluation Score
1. Basic operation Orientation, control systems, vehicle inspection, 20-item multiple-choice test
basic controls, backing, coupling and uncoupling,
proficiency development
2. Driving techniques Visual search and communication, speed and 20-item multiple-choice test
space management, night operations, extreme
driving, hazard perception, emergency maneuvers,
skid control recovery, proficiency development
3. Vehicle systems Vehicle systems functions, diagnosing and re- 20-item multiple-choice test
porting malfunctions, preventative maintenance,
shifting gears, fire prevention and safety,
accident procedures
4. Operations skills Cargo documentation, handling cargo, hours of 20-item multiple-choice test
service requirements, trip planning, hazardous
materials, computers in trucking
5. Professional driver Job placement and succeeding as a truck driver, 100-item multiple-choice test
public and employer relations, personal health covering all previous curricu-
and safety, U.S. Department of Transportation lum areas, including 5
regulations
6. Driver development Physical wellness, lifetime management 50-item multiple-choice test
7. Skill development Truck backing, parallel parking, alley docking, 23-item skill assessment in
and right/left hand turns simulator and field tests
8. Range skills and Inspection/testing of all truck systems and 33-item skill assessment in
pretrip inspection safety/emergency equipment field test similar to CDL
examination
9. Road skills Truck maneuvers covered in CDL examination 76-item skill assessment in
road test
10. Attendance Class attendance covering all topic Areas Percentage of class hours
attended
using a standard form to assess and score the risky behaviors (Deery & Fildes, 1999; Meadows,
trainee in three skill areas: pretrip inspection, Stradling, & Lawson, 1998), or perceptual and
range maneuvers, and road skills. Of the 37 cognitive abilities (Avolio, Kroeck, & Panek,
trainees, 25 passed the CDL on the first attempt 1985; French, West, Elander, & Wilding, 1993;
and the remaining 12 passed on their second Myers, Ball, Kalina, Roth, & Goode, 2000).
or third attempt. This binary categorical vari- However, to date, there has been no empirically
able (those who passed the CDL on their first validated test that may be utilized to predict
attempt and those who did not) was selected CDL examination performance outcome.
as the dependent variable for the classification Once collected, the performance data were
task. Thus the classification task was to accu- partitioned into two subsets. Five cases in which
rately predict whether or not a trainee would the trainee passed and 5 cases in which the
pass his or her first CDL examination based on trainee failed his or her initial CDL examina-
the 10 scores the individual received after com- tion were randomly selected from the original
pleting the truck driver training course. A num- data set. This collection of 10 cases constituted
ber of research studies have searched for factors the test or validation data set. The remaining 27
that are predictive of driving behavior. Some of cases constituted the training data set. Each ap-
these factors have been used to predict accident proach (discriminant analysis, logistic regression,
propensity (Reason, Manstead, Stradling, Baxter, decision tree induction, and genetic program-
& Campbell, 1990; Simon & Corbett, 1996), ming) used the training data set to develop its
corresponding classification model. Each model cients corresponding to the independent vari-
was then validated using the 10 cases of the test ables X1 through X10.
set. Clancy (1997) has recommended this pro- Using Equation 2, the dependent variable Y
cedure of data partitioning, model development, output of .5 or greater would result in the model
and validation for assessing the true predictive classifying the case as passing the CDL exami-
accuracy of classification models used in ergo- nation on the first attempt. An output of below
nomics research. What follows is a description .5 results in the model classifying the case as
of the four approaches used to create the clas- failing the CDL examination on the first at-
sification models. tempt. This same rule was applied when vali-
dating the model using the 10 cases of the test
Discriminant Analysis data set. Two logistic regression models were
Discriminant analysis (Huberty & Lowman, created: one that included a constant and one
1997) accomplishes multivariate classification that did not. The reasoning behind this decision
through the use of linear functions, as expressed was that if the logistic models differed in form,
in the equation they might also differ in terms of subsequent
predictive performance on the test data set.
Si = W1i X1i + W2i X2i +...+ W10i X10i + ci, (1)
Decision Tree Induction: The C4.5 Algorithm
in which Si = resultant classification score for The decision tree for predicting trainee suc-
the ith category, X1 through X10 = the 10 scores cess or failure on the CDL examination was
acquired by completing the truck driver training constructed using the C4.5 algorithm (Quinlan,
curriculum, W1i through W10i = weighting values 1993). The algorithm was a recursive greedy
corresponding to the independent variables X1 heuristic that selected independent variables for
through X10 for the ith category, and ci = con- membership within the tree. Whether or not an
stant for the ith category. independent variable was included within the
The classification functions, the number of tree was based on the value of its information
which corresponds to the number of categories, gain. As a statistical property, information gain
can be used to directly compute classification measured how well the variable (curriculum
scores for new observations. Specifically, newly area score) separated training cases into sub-
observed cases are classified to that group whose sets in which the outcome or dependent variable
classification function yields the highest classi- value (passing or failing the CDL examination)
fication score, Si. The discriminant analysis was homogeneous. Given that curriculum area
established two classification functions of the scores were all continuous variables, a threshold
form shown in Equation 1. These functions al- value had to be established within each score
lowed the model to provide a binary response variable so that it could partition the training
that predicted the passing or failing of the CDL cases into subsets. These threshold values for
examination based on the training curriculum each variable were established by rank ordering
scores. the values within each variable from lowest to
highest and repeatedly calculating the informa-
Logistic Regression tion gain using the arithmetical midpoint be-
The logistic regression model for predicting tween all successive values within the rank order.
outcome of the CDL examination is expressed The midpoint value with the highest information
in the equation gain was selected as the threshold value for its
score variable. That variable with the highest
1 information gain (information being the most
Y = , (2) useful for classification) was then selected for
1 + e (0 + 1X1 + 2X2 +...+ 10X10) inclusion in the decision tree.
The algorithm continued to build the tree in
in which Y = the probability that the input vector this manner until it accounted for all training
belongs to the pass category, 0 = regression cases. Ties between variables that were equal in
model constant, and 1 through 10 = coeffi- terms of information gain were broken using a
random number generator embedded in the al- program in the population. Each parse tree
gorithm. The algorithm was run 10 times, using was constructed from a random selection of
a different random number seed each time. By both function nodes and terminal nodes. The
altering the random number seed, the algo- function nodes consisted of arithmetical opera-
rithm could produce decision trees that differed tors (addition, +; subtraction, ; multiplication,
in structure and classification performance. ; division, ; return maximum value, MAX;
and return minimum value, MIN) and Boolean
Genetic Programming operators (greater than, >; less than, <; logical
The genetic programming algorithm searched and, AND; logical or, OR; if-then-else, ITE).
for accurate classification models by using five The terminal nodes consisted of the trainee
distinct stages: initialization, fitness evaluation, scores from the truck driver training course
selection, reproduction, and replacement. (Curriculum Areas 110) and random numbers
Initialization. In this stage, the algorithm ranging from 1 to 100. Figure 1 shows an exam-
generated a population of 500 computer pro- ple of a randomly generated parse tree and its
grams. A parse tree represented each computer translation into pseudo computer code.
Figure 1. Example of a single parse tree accompanied by its translation into pseudo computer code.
As shown in Figure 1, the parse tree repre- Selection. In this stage of the algorithm, can-
sents a computer program that predicts whether didates from the population of computer pro-
or not a trainee will pass the CDL examination grams were selected for survival. Using stochastic
on the first attempt. The program bases its de- selection with replacement (Goldberg, 1989), a
cision on the values assigned to the following new population of surviving programs was creat-
terminal nodes: road skill score, professional ed from the current population. Those programs
driving score, basic truck operations score, op- with the highest fitness values (i.e., most accu-
erational skills score, driving techniques score, rate classification of the training data set) had
and vehicle systems score. In order to predict the greatest probability of being repeatedly cho-
performance, each parse tree must be translated sen for survival. Thus the survivor population
into a corresponding computer program (shown was biased in the sense that there were more
in the lower half of Figure 1). The root node of copies of higher-fit programs than of lower-fit
each parse tree created at initialization was al- programs. These surviving programs participat-
ways a Boolean operator (>, <, AND, OR, ITE). ed in the two remaining stages of the genetic
This allowed each program to act as a binary programming algorithm.
classifier that predicted a trainee would either Reproduction. In this stage, surviving com-
pass the CDL examination on the first attempt puter programs produce offspring, using the
(i.e., the program returns a Boolean output of mechanisms of crossover and mutation. These
TRUE) or fail it on the first attempt (i.e., the offspring (new programs) would go on to repre-
program returns a Boolean output of FALSE). sent the next generation of programs in the pop-
The programs decision would be based on ulation. In the crossover mechanism, 60% of
the variables, constants, arithmetical operators, the surviving programs (300 parse trees) were
and Boolean operators that composed the rest randomly matched to form 150 pairs. Within
of the tree. Each parse tree is a random com- each pair of programs, randomly chosen branch-
bination of no more than 30 input variables, es (i.e., subtrees) from each programs parse tree
numerical constants, and arithmetical opera- were exchanged. The result was the creation of
tors. Constraints used in strongly typed genetic two new programs, known as offspring, which
programming (Montana, 1995) were incorpo- possessed characteristics of both parents. In the
rated in the algorithm to maintain the feasibility mutation mechanism, 1% of the surviving pro-
of all parse trees in the population during all grams (5 parse trees) had a randomly chosen
algorithmic stages. These constraints allowed branch erased and then replaced with a random-
the genetic programming algorithm to discover ly generated branch. This procedure created a
computer programs that seamlessly integrated single offspring program that was similar to, but
arithmetical operators, Boolean operators, inde- different from, its parent. For both the cross-
pendent variables, and numerical constants into over and mutation mechanisms, the offspring
an overall classification model. programs replaced their parents in the surviving
Fitness evaluation. In this stage, each of the population.
500 computer programs in the population was Replacement. In this stage of the algorithm,
assigned a fitness value. This fitness value was the surviving computer programs, some of which
simply the percentage of 27 cases in the training have been altered because of crossover and mu-
data set that the program was able to correctly tation, become the new programs of the next
classify. After a fitness value had been assigned generation.
to a program, the program was then called up- The completion of a single iteration of the
on to classify the 10 cases in the test data set. fitness evaluation, selection, reproduction, and
The specifications of those programs in the pop- replacement stages constituted a single genera-
ulation that possessed 80% accuracy across tion within the genetic programming algorithm.
both data sets were saved to a file. This proce- Within a single run, the algorithm completed
dure enabled the genetic programming algo- 1000 of these generations. Given the heuristic
rithm to retain those classification models that nature of this algorithm, 10 runs were complet-
performed well in terms of training and vali- ed, with each run starting with a different ran-
dation. dom number seed.
Criteria for Classification Model analysis are summarized in Table 3. As shown

Comparison in Table 3, three individual regressors operation
skills, professional driver, and road skills were
Comparisons among the discriminant analysis statistically significant at the = .05 level in
model, the two logistic regression models, the terms of discriminating between those trainees
most fit program found using genetic program- who passed the CDL examination on the first
ming, and the most accurate decision tree dis- attempt and those who did not.
covered using the C4.5 algorithm were based on
their classification of the 10 cases constituting Logistic Regression Analysis Model
the test data set. The classification performance Description
results on the test data set were described in The descriptive characteristics of the classifi-
terms of the number of true positives (TP), false cation models developed using logistic regression
positives (FP), true negatives (TN), and false are summarized in Tables 4 and 5. Table 4 sum-
negatives (FN), as shown in Table 2. The perfor- marizes the descriptive characteristics of the
mance of each model was evaluated using the logistic regression model that included a con-
following epidemiological equations (Hennekens stant (Model 1). The overall model was statisti-
& Buring, 1987), in which accuracy is the per- cally significant, 2(10) = 22.19, p = .014. None
centage of all cases accurately classified by the of the models individual regressors, however,
model; sensitivity is the ability of the model to were statistically significant at the = .05 level.
identify those trainees who passed the CDL test As shown in Table 4, increases in the likelihood
on the first attempt; specificity is the ability of of membership in the pass CDL examination
the model to identify those trainees who failed class were most influenced by increases in profes-
the CDL test on the first attempt; and validity sional driver, road skills, and skill development
is an overall measure of the models sensitivity score values. Table 5 summarizes a similar logis-
and specificity performance. tic regression model, one that did not include a
constant (Model 2). As with logistic regression
Accuracy=[(TP+TN)/(TP+TN+FP+FN)]100 (3) Model 1, the overall model was also statistical-
Sensitivity = [TP/(TP + FN)] 100 (4) ly significant, 2(10) = 23.67, p = .009; however,
Specificity = [TN/(TN + FP)] 100 (5) the effects of individual regressors were not.
Validity = (Sensitivity + Specificity 1) 100 (6) As shown in Table 5, increases in the road and
operation skills had the greatest impact in terms
In addition to these epidemiological criteria, of increasing the likelihood of membership in
the classification accuracy of each of the models the pass CDL examination class.
on the training set data was also noted and re-
corded. Decision Tree Model Description
Across the 10 runs, the most accurate deci-
RESULTS sion tree found by the C4.5 algorithm made use
of only 6 of the 10 curriculum area scores in
Discriminant Analysis Model Description predicting passing or failing the CDL examina-
The descriptive characteristics of the classi- tion. An illustration of this decision tree and its
fication model developed using discriminant corresponding translation into pseudo code is
TABLE 2: Results of Binary Classification Performance
Model Actual CDL Exam Outcome

Predicted
Outcome Trainee Passed Trainee Failed
True (passed) True positive (TP) False positive (FP)

False (failed) False negative (FN) True negative (TN)
Total cases TP + FN FP + TN
TABLE 3: Descriptive Characteristics of the Discriminant Analysis Classification Model
Classification Function
Coefficients
Wilkss Univariate Significance
Variable Lambda F Ratio (p) Pass CDL Fail CDL
Basic operation 0.98 0.40 .53 21.88 22.058

Driving techniques 1.00 0.11 .74 6.22 6.329
Vehicle systems 1.00 0.01 .94 10.69 10.820
Operations skills 0.85 4.57 .04 42.52 42.220
Professional driver 0.85 4.27 .05 82.59 83.110
Driver development 0.97 0.72 .40 2178.44 2173.080
Skill development 0.95 1.40 .25 23.23 23.310
Range skills and 1.00 0.10 .75 37.85 37.970
pretrip inspection
Road skills 0.86 4.11 .05 18.59 18.840
Attendance 1.00 0.08 .78 42.68 42.800
Constant N/A N/A N/A 110 275.05 110 760.340
TABLE 4: Descriptive Characteristics of Logistic Regression Model 1
Standard Wald Significance

Variable Error 2 (p) Exp()
Basic operation 1.64 1.65 1.00 .32 0.19

Driving techniques 0.60 0.79 0.58 .45 0.55
Vehicle systems 0.21 0.30 0.50 .48 1.24
Operations skills 0.24 0.41 0.36 .55 1.28
Professional driver 2.77 2.88 0.92 .34 15.940
Driver development 28.640 107.1800 0.07 .79 0.00
Skill development 0.92 1.07 0.73 .39 2.50
Range skills/pretrip 1.41 1.26 1.25 .26 0.24
Road skills 1.14 0.76 2.21 .13 3.11
Attendance 0.07 0.29 0.06 .80 1.08
Constant 02701.630000 10692.740000 0.06 .80 N/A
TABLE 5: Descriptive Characteristics of Logistic Regression Model 2
Standard Wald Significance

Variable Error 2 (p) Exp()
Basic operation 0.30 0.24 1.61 .20 0.74

Driving techniques 0.54 0.47 1.33 .25 0.58
Vehicle systems 0.40 0.39 1.02 .31 1.49
Operations skills 0.71 0.43 2.66 .10 2.02
Professional driver 0.25 0.33 0.63 .43 1.29
Driver development 1.36 0.80 2.90 .09 0.26
Skill development 0.19 0.39 0.24 .63 0.83
Range skills/pretrip 0.54 0.50 1.18 .28 0.58
Road skills 1.42 1.01 2.00 .16 4.14
Attendance 0.26 0.29 0.75 .39 1.29
shown in Figure 2. As shown in Figure 2, the examination on their first attempt: (a) their pro-
decision tree made its prediction of passing fessional driving scores were at or below 92%;
or failing the CDL examination based on the (b) their operational skill scores were at or be-
trainees scores in the following curriculum low 87%; (c) their skill development scores
areas: professional driver, operations skills, skill were at or below 74%; or (d) their vehicle sys-
development, vehicle systems, basic operation, tems scores were between 97% and 86% and
and road skills. These variables constituted the their basic truck operations and road skills ex-
decision nodes of the tree. The decision tree amination scores were greater than 97% and
depicted in Figure 2 identified four scenarios 92%, respectively. If a trainees performance did
associated with trainees who failed the CDL not fit any of these four scenarios, the decision
Figure 2. Decision tree for predicting passing or failing the CDL examination accompanied by its translation
into pseudo computer code.
tree would predict that he or she would pass the fication rules for predicting whether or not a
CDL examination. trainee would pass the CDL examination on the
first attempt. From the parse trees root node
Genetic Programming (ITE), Branch 1 represents the conditions that
Across generations within each run, the the program used to decide which classification
genetic programming algorithm was successful rule to employ. These two classification rules
in discovering computer programs that were are represented by Branches 2 and 3 in the parse
increasingly accurate in classifying cases from tree. Using Branches 1 and 2, the program pre-
the training data set, as Figure 3 illustrates. As dicted that trainees whose road skill score was
shown in Figure 3, when the stages of selection, greater than their professional driving score, or
crossover, mutation, and replacement are itera- whose vehicle systems score was greater than
tively applied from generation to generation, their operations skills score, would pass the CDL
the percentage of cases correctly classified by the examination if they achieved a road skill score
most accurate classifier in the population in- greater than 93.5. It should be noted that the
creases. Across the 10 runs, the computer pro- constant (93.5) utilized by the computer program
gram that was most accurate in classifying both and displayed in Figure 4 was not established a
the training and test data set cases made use of priori but was discovered as a consequence of
only 6 of the 10 input variables available: basic the evolutionary search steps employed by the
operation, driver technique, vehicle systems, op- genetic programming algorithm.
erations skills, professional driver, and road skills. For trainees whose road skill score was less
A depiction of the programs parse tree, and its than or equal to their professional driver score
corresponding translation into pseudo code, is and whose vehicle systems score was less than
shown in Figure 4. or equal to their operations skill score, the pro-
Review of the parse tree and corresponding gram would use the classification rule described
pseudo code in Figure 4 revealed that the com- by Branch 3 in the parse tree to predict their per-
puter program made use of two separate classi- formance on the CDL examination. Using this
Figure 3. The maximum fitness among computer programs as a function of generation.

Figure 4. A parse tree that predicts success or failure on the CDL examination accompanied by its translation
into pseudo computer code.
branch/rule, the program would predict that dict that he or she would fail the CDL exami-
trainees would pass the CDL examination on the nation.
first attempt if their performance in the training
curriculum met any of the following criteria: Comparisons between Statistical and
(a) their professional driver score was greater Machine Learning Classification Models
than their driving techniques score; (b) their op- All four models were compared based on
eration skills score was greater than their profes- their classification performance on the 10 cases
sional driver score; (c) their operations skill score within the test data set. The results of this com-
was greater than their basic operations score. If parison are summarized in Table 6. As shown in
a trainees curriculum performance met none of Table 6, all five models performed reasonably
these three criteria, the branch/rule would pre- well in terms of classifying the cases of the
TABLE 6: Performance Comparison of Classification Models
Classification Logistic Regression Model

Performance Discriminant Genetic C4.5
Criteria Analysis 1 2 Program Algorithm
Training set accuracy .88.9 .92.6 .92.6 .96.3 100

Test set accuracy 50 40 50 90 80
Test set sensitivity 60 60 80 1000 80
Test set specificity 40 20 20 80 80
Test set validity 0 20 0 80 60
Note. Criterion values given in percentages.
training data set. However, when called upon The predictive capabilities of the classification
to generalize (i.e., predict) the outcome of cases models differed in terms of true positives, true
that were not used in model development, the negatives, false positives, and false negatives.
models developed using machine learning clear- These differences are illustrated in Figure 5. As
ly outperformed those based on discriminant shown in Figure 5, in terms of sensitivity (i.e.,
analysis and logistic regression. This advantage true positives), performance was weakest for
is especially evident in terms of specificity (i.e., discriminant analysis and for logistic regres-
the models ability to identify trainees who failed sion Model 1. These models had the highest
the CDL examination on their first attempt). percentages of false negatives of all the classifi-
Comparisons between machine learning tech- cation models and were able to identify only
niques revealed that the genetic programming three of the five test cases (60%) in which the
and decision tree induction approaches were student driver passed the CDL exam. The genet-
comparable, with only a single case separating ic program model possessed the strongest sensi-
their performances on both the training and tivity: It correctly identified all five test cases in
test data sets. which the student passed. In terms of specificity
Figure 5. Predictive capability of classification models in terms of true positives (true+), true negatives
(true), false positives (false+), and false negatives (false).
(i.e., true negatives), performance was weakest decision tree and genetic program a level of flex-
for both logistic regression models. These mod- ibility not possessed by the traditional statistical
els had the highest percentages of false positives approaches used in this study. These findings,
and were able to identify only one of the five albeit limited in nature, support the notion that
test cases (20%) in which the student driver machine learning techniques, such as genetic
failed the CDL exam. Specificity performance programming and decision tree induction, could
was highest for the machine learning models be used as a viable alternative to traditional sta-
(genetic program and C4.5 algorithm). These tistical approaches in human-factors-related
models were able to correctly classify four of research.
the five test cases (80%) in which the student Although successful in its scope, the current
failed the CDL exam. study has limitations that should be noted. First,
the size of the data set used to develop and com-
DISCUSSION pare the classification models was limited to the
performances of 36 men and only 1 woman.
Results from the current study suggest that Given this limitation, it is unknown whether or
the machine learning approaches of the genetic not these models would be applicable to the
programming algorithm and the C4.5 algo- driving performances of female truck drivers.
rithm were more accurate at classifying driver Second, the outputs of all models developed and
performance on the CDL examination than tested were binary (pass or fail CDL examina-
were discriminant analysis and logistic regres- tion) in nature. These models did not provide
sion. Overall, the traditional statistical models insight into what aspects of the CDL exam a
tended to make false positive identifications, driver might fail, given his or her performance
predicting that those drivers who failed the in the training curriculum. This specific infor-
CDL exam would pass. The classification mod- mation could prove to be valuable to instructors
els based on machine learning, however, did not by allowing them to more effectively address
demonstrate this error tendency. In this exam- the trainees educational needs. Third, in their
ple, the machine learning methods employed current form, these classification models can
were able to identify relationships between the only identify trainees who may have considerable
independent and dependent variables that were difficulty on their CDL examination. They do
consistent across different data sets, whereas not suggest or prescribe specific interventions
discriminant analysis and logistic regression that an instructor can use to improve the trainees
were not. Unlike discriminant analysis and logis- performance on the examination.
tic regression, the machine learning approaches Fourth, the performance comparison of these
were not constrained by predetermined models approaches was based on a single data source.
based on a linear summation of independent One would expect classification performance
variables. differences between these approaches to vary
The decision tree, discovered by the C4.5 depending on the data used for model develop-
algorithm, relied on a linked series of fixed ment and validation. For example, Carnahan
threshold values for the curriculum area scores and Redfern (1998b) found that their classifica-
when making its prediction of CDL examination model, discovered using genetic program-
tion outcome. The parse tree, discovered using ming, outperformed logistic regression when
genetic programming, made use of a threshold predicting low-back injury risk for a set of occu-
value and a series of relative comparisons be- pational lifting tasks. Sohn and Shin (2001),
tween curriculum area scores in order to make however, found no difference in the accuracy
its prediction. The decision and parse trees cor- of logistic regression and the C4.5 algorithm in
responding computer programs made their deci- classifying automotive accidents in terms of
sions by assessing the trainees curriculum area their severity outcomes. Further research is nec-
scores and, based on this assessment, deter- essary to fully assess the potential advantages
mined which classification rules to use in the machine learning classifiers would have over
prediction of CDL examination outcome. This their traditional statistical counterparts in the
ability to switch between rules afforded the human factors area.
In order to address these limitations, future Hennekens, C. H., & Buring, J. E. (1987). Epidemiology in medi-
cine. Boston: Little, Brown.
research will make use of larger training data Huberty, C. J., & Lowman, L. L. (1997). Discriminant analysis via
sets and test data sets from more diverse popula- statistical packages. Educational and Psychological Measure-
ment, 57, 759784.
tions. This will allow for statistical repeatability Kishore, J. K., Patnaik, L. M., Mani, V., & Agrawal, V. K. (2000).
in training and testing various models. Future Application of genetic programming for multicategory pattern
classification. IEEE Transactions on Evolutionary Computa-
research will also focus on the expansion of the tion, 4, 242258.
classification models beyond simple binary out- Koza, J. R. (1992). Genetic programming: On the programming of
computers by means of natural selection. Cambridge, MA:
comes into multicategory classification. With MIT Press.
this type of approach, one could attempt to pre- Koza, J. R. (1994). Genetic programming II: Automatic discovery
of reusable programs. Cambridge, MA: MIT Press.
dict what specific elements of the CDL exami- Koza, J. R., Bennett, F. H., Andre, D., & Keane, M. A. (1999).
nation would prove most difficult for certain Genetic programming III: Darwinian invention and problem
solving. San Francisco: Morgan Kaufmann.
trainee drivers. Finally, future research will focus Lubin, B., Fielder, E. R., & Van Whitlock, R. (1999). Predicting
on applying these (and other) machine learning discharge from airforce basic training by battery of affect.
Journal of Clinical Psychology, 55, 7178.
techniques to other classification problems in Mackieh, A., & Cilingir, C. (1998). Effects of performance shaping
the fields of safety, ergonomics, human factors, factors on human error. International Journal of Industrial
Ergonomics, 22, 285292.
and human performance. Marras, W. S., Lavender, S. A., Leurgans, S., Sudhakar, L. R.,
Allread, W. G., Fathallah, F., & Ferguson, S. (1993). The role
REFERENCES of dynamic three dimensional trunk motion in occupationally
related low back disorders. Spine, 18, 617628.
Al-Ghamdi, A. S. (2002). Using logistic regression to estimate the Matias, A. C., Salvendy, G., & Kuczez, T. (1998). Predictive mod-
influence of accident factors on accident severity. Accident els of carpal tunnel syndrome causation among VDT operators.
Analysis and Prevention, 34, 729741. Ergonomics, 41, 213226.
Avolio, B. J., Kroeck, K. G., & Panek, P. E. (1985). Individual dif- McFadden, M. (1997). Predicting pilot-error incidents of U.S. air-
ferences in information-processing ability as a predictor of line pilots using logistic regression. Applied Ergonomics, 28,
motor vehicle accidents. Human Factors, 27, 577587. 209212.
Babski-Reeves, K. L., & Crumpton-Young, L. L. (2002). Com- Meadows, M. L., Stradling, S. G., & Lawson, S. (1998). The role
parisons of measures for quantifying repetition in predicting of social deviance and violations in predicting road traffic acci-
carpal tunnel syndrome. International Journal of Industrial dents in a sample of young offenders. British Journal of
Ergonomics, 30, 16. Psychology, 89, 417431.
Byrnes, M., & Fox, D. (Eds). (1999). Bumper-to-bumper: The com- Montana, D. J. (1995). Strongly typed genetic programming.
plete guide to tractor-trailer operations (3rd ed.). Corpus Christi, Evolutionary Computation, 3, 199230.
TX: Byrnes & Associates. Myers, R. H. (1990). Traditional and modern regression with
Carnahan, B. J., Norman, B., & Redfern, M. S. (2001). Incor- applications (2nd ed.). Belmont, CA: Duxbury.
porating physical demand criteria into assembly line balancing. Myers, R. S., Ball, K. K., Kalina, T. D., Roth, D. L., & Goode, K.
IIE Transactions, 33, 875887. T. (2000). Relation of useful field of view and other screening
Carnahan, B. J., & Redfern, M. S. (1998a). Application of genetic tests to on-road driving performance. Perceptual and Motor
algorithms to the design of lifting tasks. International Journal Skills, 91, 279290.
of Industrial Ergonomics, 21, 145158. Pham, D. T., & Onder, H. H. (1992). A knowledge-based system
Carnahan, B. J., & Redfern, M. S. (1998b). Building a low back infor optimizing workplace layouts using a genetic algorithm.
jury risk classifier using evolutionary computation. In Proceed- Ergonomics, 35, 14791487.
ings of the Human Factors and Ergonomics Society 42nd Porter, B. E., & England, K. J. (2000). Predicting red-light running
Annual Meeting (pp. 881885). Santa Monica, CA: Human behavior: A traffic safety study in three urban settings. Journal
Factors and Ergonomics Society. of Safety Research, 31, 18.
Carnahan, B. J., Redfern, M. S., & Norman, B. (2000). Designing Quinlan, J. R. (1986). Induction of decision trees. Machine Learn-
safe job rotation schedules using optimization and heuristic ing, 1, 81106.
search. Ergonomics, 43, 543560. Quinlan, J. R. (1993). C4.5: Programs for machine learning. San
Clancy, E. A. (1997). Factors influencing the resubstitution accura- Mateo, CA: Morgan Kaufmann.
cy in multivariate classification analysis. Ergonomics, 40, Rayson, M., Holliman, D., & Belyavin, A. (2000). Development of
417427. physical selection procedures for the British Army Phase 2:
Clarke, D. D., Forsyth, R., & Wright, R. (1998a). Behavioural fac- Relationship between physical performance test and criterion
tors in accidents at road junctions: The use of a genetic algo- tasks. Ergonomics, 43, 73105.
rithm to extract descriptive rules from police case files. Reason, J., Manstead, A., Stradling, S., Baxter, J., & Campbell, K.
Accident Analysis and Prevention, 30, 223234. (1990). Errors and violations on the roads: A real distinction?
Clarke, D. D., Forsyth, R., & Wright, R. (1998b). Machine learn- Ergonomics, 33, 13151332.
ing in road accident research: Decision trees describing road Sebald, A. V., & Fogel, L. J. (Eds.). (1994). Proceedings of the 3rd
accidents during cross-flow turns. Ergonomics, 41, 10601079. Conference on Evolutionary Computation. Princeton, NJ:
Deery, H. A., & Fildes, B. N. (1999). Young novice driver sub- World Scientific.
types: Relationship to high-risk behavior, traffic accident Simon, F., & Corbett, C. (1996). Road traffic offending, stress, age,
record, and simulator driving performance. Human Factors, and accident history among male and female drivers. Ergo-
41, 628643. nomics, 39, 757780.
French, D. J., West, R. J., Elander, J., & Wilding, J. M. (1993). Sohn, S., & Shin, S. (2001). Pattern recognition for road traffic
Decision-making style, driving style, and self-reported involve- accident severity in Korea. Ergonomics, 44, 107117.
ment in road traffic accidents. Ergonomics, 36, 627644. Vergara, M., & Page, A. (2002). Relationship between comfort and
Goldberg, D. E. (1989). Genetic algorithms in search, optimiza- back posture and mobility in sitting posture. Applied Ergo-
tion, and machine learning. New York: Addison Wesley. nomics, 33, 18.
Hanson, J. P., Redfern, M. S., & Mazumdar, H. (1999). Predicting Verwey, W. B., & Zaidel, D. M. (2000). Predicting drowsiness acci-
slips and falls considering required and available friction, dents from personal attributes, eye blinks, and ongoing behavior.
Ergonomics, 42, 16191633. Personality and Individual Differences, 28, 123142.
Brian J. Carnahan is an assistant professor in the De- Lois-Ann Kuntz is an assistant professor in the Divi-
partment of Industrial and Systems Engineering at sion of Education and Behavioral Sciences at the
Auburn University. He received his Ph.D. in industrial University of Maine at Machias. She received her
engineering in 1999 at the University of Pittsburgh. Ph.D. in sensory processes and cognitive psychology
in 1996 at the University of Florida.
Grard Meyer is president of Chez Grard Consulting,
Pittsburgh, Pennsylvania. He received his doctoral de-
gree in ergonomic physiology in 1973 at the Conserva- Date received: October 31, 2001
toire National des Arts et Mtiers (Paris), France. Date accepted: February 24, 2003

Human Factors - The Journal of The Human Factors and Ergonomics Society-2003-Carnahan-408-23

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Human Factors - The Journal of The Human Factors and Ergonomics Society-2003-Carnahan-408-23

Uploaded by

Copyright:

Available Formats

Comparing Statistical and Machine Learning Classifiers:

Alternatives for Predictive Modeling in Human Factors

Multivariate classification models play an increasingly important role in human fac-

INTRODUCTION carpal tunnel syndrome status (Babski-Reeves

solutions to various ergonomics design problems Data Collection Procedure and

Curriculum Area Curriculum Content Basis for Evaluation Score

Criteria for Classification Model analysis are summarized in Table 3. As shown

TABLE 2: Results of Binary Classification Performance

Model Actual CDL Exam Outcome

True (passed) True positive (TP) False positive (FP)

TABLE 3: Descriptive Characteristics of the Discriminant Analysis Classification Model

Basic operation 0.98 0.40 .53 21.88 22.058

TABLE 4: Descriptive Characteristics of Logistic Regression Model 1

Standard Wald Significance

Basic operation 1.64 1.65 1.00 .32 0.19

TABLE 5: Descriptive Characteristics of Logistic Regression Model 2

Standard Wald Significance

Basic operation 0.30 0.24 1.61 .20 0.74

Figure 3. The maximum fitness among computer programs as a function of generation.

TABLE 6: Performance Comparison of Classification Models

Classification Logistic Regression Model

Training set accuracy .88.9 .92.6 .92.6 .96.3 100

Downloaded from hfs.sagepub.com at DEFENCE INSTITUTE OF PSYCHOG. on August 23, 2016

You might also like