You are on page 1of 10

IAETSD JOURNAL FOR ADVANCED RESEARCH IN APPLIED SCIENCES, VOLUME 4, ISSUE 1, JAN-JUNE /2017

ISSN (ONLINE): 2394-8442

RAILWAY ACCIDENT PRONE AREAS DETECTION USING


APRIORI ALGORITHM
K.JAYABHARATHI AND A. JAMALUDEEN
Christ College of Engineering and Technology, Pondicherry, India
jayavisu1994@gmail.com

ABSTRACT.

Text mining, also referred to as text data mining, roughly equivalent to text analytics, refers to the
process of deriving high-quality information from text. High-quality information is typically derived through
the devising of patterns and trends through means such as statistical pattern learning. In this project represent
an important safety concern for the transportation industry in many countries. In the 11 years from 2001 to 2012,
the U.S. had more than 40 000 rail accidents that cost more than $45 million. While most of the accidents during
this period had very little cost, about 5200 had damages in excess of $141 500. To better understand the
contributors to these extreme accidents, the Federal Railroad Administration has required the railroads involved
in accidents to submit reports that contain both fixed field entries and narratives that describe the
characteristics of the accident. While a number of studies have looked at the fixed fields, none have done an
extensive analysis of the narratives. This project describes the use of text mining with a combination of
techniques to automatically discover accident characteristics that can inform a better understanding of the
contributors to the accidents. The study evaluates the efficacy of text mining of accident narratives by assessing
predictive performance for the costs of extreme accidents. The results show that predictive accuracy for accident
costs significantly improves through the use of features found by text mining and predictive accuracy further
improves through the use of modern ensemble methods. Importantly, this study also shows through case
examples how the findings from text mining of the narratives can improve understanding of the contributors to
rail accidents in ways not possible through only fixed field analysis of the accident reports.

Index TermsRail safety, safety engineering, random forests.

I. INTRODUCTION
A review of the data collected by the FRA shows a variety of accident types from derailments to truncheon bar entanglements. Most of the
accidents are not serious; since, they cause little damage and no injuries. However, there are some that cause over $1M in damages, deaths of
crew and passengers, and many injuries. The problem is to understand the characteristics of these accidents that may inform both system design
and policies to improve safety [1]. After each accident a report is completed and submitted to the FRA by the railroad companies involved. This
report has a number of fields that include characteristics of the train or trains, the personnel on the trains, the environmental conditions (e.g.,
temperature and precipitation), operational conditions (e.g., speed at the time of accident, highest speed before the accident, number of cars, and
weight), and the primary cause of the accident. Cause is a four character, coded entry based on based on 5 overall categories (discussed in
Section IV).The FRA also collects data on the costs of each accident decomposed into damages to track and equipment to include the number of
hazardous material cars damaged. Additionally [2], they report the number of injuries and deaths from each accident. Finally, the accident
reports contain narratives which provide a free text description of the accident. These narratives contain more description about the causes and
contributors to the accidents and their circumstances. However, for brevity these narratives use railroad specific jargon that make them difficult
to read by personnel from outside the industry. This paper describes an investigation to understand the possible predictors or contributors to
accidents obtained from mining the narrative text in rail accident reports [3]. To do this the approach integrates a combination of analytical
methods to first identify the accidents of interest and then look for relationships in the structured and unstructured data that may suggest
contributors to accidents.

To Cite This Article: K.JAYABHARATHI AND A. JAMALUDEEN,. RAILWAY ACCIDENT PRONE AREAS
DETECTION USING APRIORI ALGORITHM. Journal for Advanced Research in Applied Sciences ;Pages: 262-
271
263. K.JAYABHARATHI AND A. JAMALUDEEN,. RAILWAY ACCIDENT PRONE AREAS DETECTION USING
APRIORI ALGORITHM. Journal for Advanced Research in Applied Sciences; Pages: 262-271

Fig 1:- Architecture Diagram

1.1. Existing System


RAILWAY ACCIDENT PRONE AREAS DETECTION USING APRIORI ALGORITHM

One of the well-studied areas of rail safety concerns rail crossings by roadways. A recent application of fuzzy sets and clustering to
guide the selection of rail crossings for active safety systems (e.g., bells, lights, and barriers). Describe the use of logistic regression and mixed
regression to model the behavior of drivers at railway crossings. Describes the use of neural networks to model intersection crashes and
intersection characteristics, such as [4], lighting, surface materials, etc. Taken together these papers show the use data mining to better
understand the factors that can influence and improve safety at rail crossings. Recent work has shown the applicability of data and text mining to
broader classes of safety and security problems relevant to transportation.

For example, the use of data mining techniques for anomaly detection in road networks is illustrated by the work. They provide
methods to detect anomalies in massive amounts of traffic data and then cluster these detections according to different attributes. Similarly
DAndreaet al. mined Twitter and used support vector machines to detect traffic events. Another recent application of text mining is to license
plate recognition. monitor that checks that a user presents proper certification before allowing him to access records or files. However, services
are increasingly [5] storing data in a distributed fashion across many servers. Replicating data across several locations has advantages in both
performance and reliability. One of the most well-studied areas of rail safety concerns rail crossings by roadways. A recent application of fuzzy
sets and clustering to guide the selection of rail crossings for active safety systems (e.g., bells, lights, and barriers) They use a four element
categorical variable that classifies the accidents as substantial [6], destroyed, minor, and none as the response. They combine structured text with
key words found from text mining with boosted trees (a method discussed below) and show some improvement in an ROC curve for
classification of substantial damage over models with just structured text inputs.

DRAWBACKS

1. No guaranteed security.
2. Highly prone to errors.
3. User cant view all details
4. Doesnt send all accident report.

1.2. Proposed System

This model incorporate into the accident damage models using two approaches. In the first approach we use a two step process. First
predict damage with only the. In other words, this prediction was made with only the text as input. And then estimate the residuals from this
text only prediction using random forest models with the remaining predictor variables [7]. To obtain the total accident damage cost estimates
by first predicting the residuals and then adding them to the prediction for accident damage from the. In the second approach we us the PLS
component to estimate the coefficients for each word and directly use the results as another predictor variable, the, in the random forest model.
The PLS predictor is then simply a linear combination of the words in the accident narratives. In our tests this was consistently the most
important variable used by the random forest models. The model shows the RMSE for the different combinations of supervised learning methods
with text mining techniques. These results answer the second question and show that ensemble methods do provide lift in predicting accident
severity. As with the results, the values in this table also show that the ensemble methods improve in predictive accuracy with the inclusion of
text mining results.

As to the type of text mining. Bagging or bootstrap aggregation builds multiple models by resembling subsets of the original data. The
subsets are created by randomly removing some of the original predictor variables. Again the estimates from the resulting models are combined
to produce one overall estimate [8]. Text mining is concerned with finding patterns in unstructured text. This field has become increasingly
important because of the large amounts of data available in documents, news articles, research papers, and accident reports. In many cases text
databases are semi structured because in addition to the free text they also contain structured fields that have the titles, authors, dates, and other
meta data. The accident reports used in this paper are semi structured. One of the key goals of text mining is to characterize the contents of the
documents through pattern discovery. These patterns may then be used for improved information retrieval or, as in this paper, for input into
predictive models. Regardless of the ultimate goal, most text mining begins with vector space models where documents are represented by term-
document matrices. These matrices have terms as headers for the rows and documents as headers for the columns. The values in the cells give
the count or frequencies of a term (row) in a document (column).
264. K.JAYABHARATHI AND A. JAMALUDEEN,. RAILWAY ACCIDENT PRONE AREAS DETECTION USING
APRIORI ALGORITHM. Journal for Advanced Research in Applied Sciences; Pages: 262-271

ALGORITHM

Apriori algorithm Purpose:

Key concepts in mining frequent itemsets.


Understand the Apriori algorithm.
Run Aprioriin a programmatic way.

General Process:

Association rule generation is usually split up into two separate steps:


First, minimum support is applied to find all frequent item sets in a database.
Second, these frequent item sets and the minimum confidence constraint are used to form rules.

While the second step is straight forward, the first step needs more attention. Finding all frequent item sets in a database is difficult
since it involves searching all possible item sets (item combinations). The set of possible item sets is the power set over I and has size 2n 1
(excluding the empty set which is not a valid item set).

Although the size of the power set grows exponentially [9] in the number of items n in I, efficient search is possible using the
downward-closure property of support (also called anti-monotonicity) which guarantees that for a frequent item set, all its subsets are also
frequent and thus for an infrequent item set, all its supersets must also be infrequent. Exploiting this property, efficient algorithms (e.g., Apriori
and Eclat) can find all frequent item sets.

Fig: Frequently Accident

II. MODULE DESCRIPTION

Modules:

1. Generate Accident Report


2. Characteristics of Accident Report
3. Stored In databases:

Step by Step Process:

User:
User Register the Accident details and casualty details.
All the details stored in the Database.
Admin:
Admin can verify the Accident details.

1) Traffic Safety at RoadRail Level Crossings Using a Driving Simulator and Traffic SimulationInhi Kim, Gregoire S. Larue, Luis
Ferreira, AndryRakotonirainy, and KhaledShaaban Several intelligent transportation systems (ITS) were used with an advanced driving
simulator to assess its influence on driving behavior. Three types of ITS interventions were tested: video in vehicle, audio in vehicle, and on-
road flashing marker. The results from the driving simulator were inputs for a developed model that used traffic micro simulation (VISSIM 5.4)
to assess the safety interventions.
265. K.JAYABHARATHI AND A. JAMALUDEEN,. RAILWAY ACCIDENT PRONE AREAS DETECTION USING
APRIORI ALGORITHM. Journal for Advanced Research in Applied Sciences; Pages: 262-271

Using a driving simulator, 58 participants were required to drive through active and passive crossings with and without an ITS device and in the
presence or absence of an approaching train. The effect of changes in driver speed and compliance rate was greater at passive crossings than at
active crossings. The slight difference in speed of drivers approaching ITS devices indicated that ITS helped drivers encounter crossings in a
safer way. Since the traffic simulation was not able to replicate a dynamic speed change or a probability of stopping that varied depending on
ITS safety devices, some modifications were made to the traffic simulation. The results showed that exposure to ITS devices at active crossings
did not influence drivers behavior significantly according to the traffic performance indicator, such as delay time, number of stops, speed, and
stopped delay. However, the results of traffic simulation for passive crossings, where low traffic volumes and low train headway normally occur,
showed that ITS devices improved overall traffic performance

III.CODING DESIGN

<%
char i='A';
try
{
dbiodb=new dbio();
String sql="select TrainNumber,Casualty from
Casualty_accident";
ResultSetrs=db.getRecordsResultSet(sql);
while(rs.next())
{
%>

<%
cauality_lst.add(rs.getString("TrainNumber"));
cauality_lst.add(rs.getString("Casualty"));
cauality_lst.add(i++);
}
}catch(Exception e)
{
e.printStackTrace();

}
%>

<%
char i2=i;
try
{
dbiodb=new dbio();
String sql="select TrainNumber,Accident from highspeed_accident order by trainnumber";
ResultSetrs=db.getRecordsResultSet(sql);
while(rs.next())
{
%>

<%
highspeed_lst.add(rs.getString("TrainNumber"));
highspeed_lst.add(rs.getString("Accident"));
highspeed_lst.add(i2++);
}
}catch(Exception e)
{
e.printStackTrace();
}
%>

<%
char i3=i2;
try
{
dbiodb=new dbio();
String sql="select Location,Accident from
locationwise_accident";
ResultSetrs=db.getRecordsResultSet(sql);
while(rs.next())
{
266. K.JAYABHARATHI AND A. JAMALUDEEN,. RAILWAY ACCIDENT PRONE AREAS DETECTION USING
APRIORI ALGORITHM. Journal for Advanced Research in Applied Sciences; Pages: 262-271

%>

<%
location_lst.add(rs.getString("Location"));
location_lst.add(rs.getString("Accident"));
location_lst.add(i3++);
}
}catch(Exception e)
{
e.printStackTrace();
}
%>

<%
char i4=i3;
try
{
dbiodb=new dbio();
String sql="SELECT Location,Accident FROM
signalfault_accident";
ResultSetrs=db.getRecordsResultSet(sql);
while(rs.next())
{
%>

<%
signalfault_lst.add(rs.getString("Location"));
signalfault_lst.add(rs.getString("Accident"));
signalfault_lst.add(i4++);
}
}catch(Exception e)
{
e.printStackTrace();
}

%>

<%
char i5=i4;
try
{
dbiodb=new dbio();
String sql="select TrainNumber,Accident from
trainwise_accident";
ResultSetrs=db.getRecordsResultSet(sql);
while(rs.next())
{
%>

<%

//Remove Duplicate
Object[] st = train_lst.toArray();
for (Object s : st) {
if (train_lst.indexOf(s) != train_lst.lastIndexOf(s)) {
train_lst.remove(train_lst.lastIndexOf(s));
}
}
Object[] st1 = place_lst.toArray();
for (Object s : st1) {
if (place_lst.indexOf(s) != place_lst.lastIndexOf(s))
{
place_lst.remove(place_lst.lastIndexOf(s));
}
}
267. K.JAYABHARATHI AND A. JAMALUDEEN,. RAILWAY ACCIDENT PRONE AREAS DETECTION USING
APRIORI ALGORITHM. Journal for Advanced Research in Applied Sciences; Pages: 262-271

//remove duplicate

for(int j=0;j<train_lst.size();j++)
{
out.print("<tr><td>Train Number :"+train_lst.get(j).toString()+"</td></tr>");
}
%>

SCREEN SHOTS

Home

Registration

Accident Entry
268. K.JAYABHARATHI AND A. JAMALUDEEN,. RAILWAY ACCIDENT PRONE AREAS DETECTION USING
APRIORI ALGORITHM. Journal for Advanced Research in Applied Sciences; Pages: 262-271

Admin Login

Accident Summary

Accident Location View


269. K.JAYABHARATHI AND A. JAMALUDEEN,. RAILWAY ACCIDENT PRONE AREAS DETECTION USING
APRIORI ALGORITHM. Journal for Advanced Research in Applied Sciences; Pages: 262-271

Send Report

Casualty Entry

Passenger Entry
270. K.JAYABHARATHI AND A. JAMALUDEEN,. RAILWAY ACCIDENT PRONE AREAS DETECTION USING
APRIORI ALGORITHM. Journal for Advanced Research in Applied Sciences; Pages: 262-271

Frequently Accidents

Chart

IV CONCLUSION
In this Paper, show that the combination of text analysis with ensemble methods can improve the accuracy of models for predicting accident
severity and that text analysis can provide insights into accident characteristics. Modern text analysis methods make the narratives in the
accident reports almost as accessible for detailed analysis as the fixed fields in the reports. More importantly as the examples illustrated, text
mining of the narratives can provide a much richer amount of information than is possible in the fixed fields. Finally, as described in the work
here used standard methods to clean the narratives. However, train accident narratives use jargon common to the rail transport industry and
classical stemming and stop word removal do not necessarily do a good job of characterizing the words used in this industry. For train safety
analysis, text mining could benefit from a careful look at ways to extract features from text that takes advantage of language characteristics
particular to the rail transport industry.
271. K.JAYABHARATHI AND A. JAMALUDEEN,. RAILWAY ACCIDENT PRONE AREAS DETECTION USING
APRIORI ALGORITHM. Journal for Advanced Research in Applied Sciences; Pages: 262-271

V ACKNOWLEDGMENTS
I thank my HOD Mr.P.Rajapandian, M.C.A., M.Phil,. M.C.S.I (Department of Computer Applications) to help us for creating this paper with his
sincere guidance and Technical Expertise in the field of communication. The help of my guide Mr.A.Jamaludeen, M.C.A.,
M.C.S.E.,M.I.S.T.E.,(Departmentof Computer applications), Christ College of Engineering& Technology is really immense and once again I
thank her for her great motivation. I thank Christ College of Engineering& Technologyto provide me such a standard educational environment
so that I am able to understand the minute concepts in the field of Engineering.

REFERENCES
[1] Railroad safety statistics2009 Annual reportFinal, Federal Railroad Admin., Washington, DC, USA, Apr. 2011. [Online].
Available:http://safetydata.fra.dot.gov/OfceofSafety/publicsite/Publications.aspx
[2] Ofce of safety analysis, Federal Railroad Administration, Washington, DC, USA, Oct. 2009. [Online]. Available:
http://safetydata.fra.dot.gov/ ofce of safety/
[3] G. Cirovic and D. Pamucar, Decision support model for prioritizing railway level crossings for safety improvements: Application of the
adaptive neuro-fuzzy system, Expert Syst. Appl., vol. 40, pp. 22082223, 2013.
[4] L.-S. Tey, G. Wallis, S. Cloete, and L. Ferreira, Modelling driver behaviour towards innovative warning devices at railway level crossings,
Neural Comput. Appl., vol. 51, pp. 104111, Mar. 2013.
[5] D.Akin and B. Akbas, A neural network (NN) model to predict intersection crashes based upon driver, vehicle and roadway surface
characteristics, Sci. Res. Essays, vol. 5, pp. 28372847, 2010.
[6] H. Gonzalez, J. Han, Y. Ouyang, and S. Seith, Multidimensional data mining of traffic anomalies on large-scale road networks,Transp.
Res.Rec., vol. 2215, pp. 7584, 2011.
[7] E. DAndrea, P. Ducange, B. Lazzerini, and F. Marcelloni, Real-time detection of traffic from Twitter stream analysis,IEEE Trans. Intell.
Transp. Syst., vol. 16, no. 4, pp. 22692283, Mar. 2015.
[8] F. Oliveira-Neto, L. Han, and M. K. Jeong, An online self-learning algorithm for license plate matching,IEEE Trans. Intell. Transp. Syst.,
vol. 14, no. 4, pp. 18061816, Dec. 2013.
[9] J. Cao et al., Web-based traffic sentiment analysis: Methods and applications,IEEE Trans. Intell. Transp. Syst., vol. 15, no. 2, pp. 844853,
Apr. 2014.
[10] J. Burgoon et al., Detecting concealment of intent in transportation screening: A proof of concept,IEEE Trans. Intell. Transp. Syst., vol.
10, no. 1, pp. 103112, Mar. 2009.
[11] Y. Zhao, T. H. Xu, and W. Hai-feng, Text mining based fault diagnosis of vehicle on-board equipment for high speed railway, inProc.
IEEE 17th Int. Conf. ITSC, Oct. 2014, pp. 900905.
[12] T. Hofmann, Probabilistic latent semantic indexing, inProc. 22nd Annu. Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 1999, pp.
5057.
[13] R. Nayak, N. Piyatrapoomi, J. W. R. Nayak, N. Piyatrapoomi, and J. Weligamage, Application of text mining in analysing road crashes for
road asset management, inProc. 4th World Congr. Eng. Asset Manage., Athens, Greece, Sep. 2009, pp. 4958.
[14] Leximancer Pty Ltd. [Online]. Available: http://info.leximancer.com/academic
[15] A. E. Smith and M. S. Humphreys, Evaluation of unsupervised semantic mapping of natural language withLeximancer concept
mapping,Behav. Res. Methods, vol. 38, no. 2, pp. 262279, 2006.

K.Jayabharathi currently pursuingM.C.A inMaster of Computer Application form Christ college of Engineering and Technology.Her area of IT
and Database. Programming language known are C, Java, ASP.NET, PHP

Mr.A.Jamaludeen, M.C.A., M.C.S.E., M.I.S.T.E.,is the Senior Assistant Professor of Computer Application from Christ college of Engineering
and Technology, Moolakulam, Puduherry.He is research interests include authentication, software security, usability, and network security.