Professional Documents
Culture Documents
2 Background Study
3 Research Objectives
The dataset combines the traffic counts in ballarat 2012 as well as it also consists of the
Police traffic enforcement activity in that region. There are so many types of activity
that has been led to the data set. It consists of violations such as Traffic violation, Motor
violations, Parking Violations and many more. The locations in the data set has been
stated using Latitude and Longitude which helps the user to determine based on the
incident type and incident number where the incident took place and its frequency that
for how many times it has occurred. Moreover, it also consists of the data of the days
where most traffic is being recorded as well as the data for the traffic counts has been
divided into 3 columns; MidweekADT, WeekendADT and X7DaysADT. These infor-
mation leads us to information that how much the pressure is there.
The document configuration of the dataset is CSV (Comma Separated Values) record,
which appoints information to be spared in a table organized arrangement. It takes after
a run of the mill spreadsheet just with a .csv expansion. Fundamentally, it appears as a
content document comprising of information insulated by commas.
I have used the Reasoning cycle methodology for my research. My methodology com-
piles of Hypothesis, Predictions and Test of Predictions for the result (KOTHARI,
1990).
4.4 Attributes of The Dataset:
The table consists of the variables in my dataset which are described below.
CountID Numeric
Date String
Road String
Location String
Direction String
Days Numeric
MidweekADT Numeric
WeekendADT Numeric
7DaysADT Numeric
Latitude Numeric
Longitude Numeric
Incnum Numeric
Inctype String
Inctypecode Numeric
The research question mainly focuses on the segment of a more extensive topic area. It
is the issue that an individual will attempt to answer when the research on a topic is
done. In the case of my research Police Traffic Enforcement Activity, the research ques-
tion is as follows:
1. Can the prediction of the rate of traffic be lessen by depending on the results of traffic
possibilities as well as its rate of per day?
2. How the rate of Midweek traffic differs from the Weekend traffic?
3. On which street the number of activity is taking place per data set?
4. Which incident Type has got the more ratios then the others?
In this part of the research, I will break down the information inside the dataset along-
side the usage of the exploration that has been done. To start the analysis and usage of
the dataset, I need to stack the dataset on to R keeping in mind the end goal to work
with the distinctive variables/objects that are put away inside the dataset (Torres-Reyna,
2010). The accompanying code has been utilized to stack the dataset:
Now that the dataset has been loaded on to R, the next step is to attach the data frame
to the search path so that it is possible to refer to the variables in the data frame by their
names. The following code has been used to attach the data frame:
“attach(ptea)”
Now that the extra variables have been nullified, I will re-check to ensure that the var-
iables have been nullified. To check the remaining variables, I will run the following
codes:
I have nullified the extra variables from the dataset previously and now I want to see a
summary of the remaining variables to view the minimum, 1st quartile, median, mean,
3rd quartile and the maximum of each variable. The following code will let me view the
summary:
“summary(ptea)”
However, I will not need to work with all the variables in the dataset as R only recog-
nizes the numeric variables. To remove the extra variables, I will run a code through
the script of R to nullify the variables. The following codes show the process:
“
ptea$Date<-NULL
ptea$Road<-NULL
ptea$location<-NULL
ptea$Direction<-NULL
ptea$stnum<-NULL
ptea$stname1<-NULL
ptea$stname2<-NULL
ptea$stname3<-NULL
”
The following screenshot shows the summary of each of the variables within the da-
taset:
Now I will plot a histogram of the “No. of Incident” to view that which incident is
occurring more frequently. The following code will produce the histogram and the
screenshot allows me to view the region with the highest rate of incident:
“plot(Days ~ inctype)”
Next I will plot a graph of Midweek average day Traffic record against Weekend aver-
age day traffic. By plotting this graph, I will get a clear idea at what rate the traffic is
increasing. The following code will produce the plotted graph and the screenshot shows
the graph:
Information focuses with huge residuals or potentially high influence may affect the
result and accuracy of a regression. Cook's distance measures the impact of erasing a
given observation. Cook's distance with many point in the analysis are being considered
for a closer examination to be merit.
Moreover, if I want to predict that at what state of the month the traffic is going to
increase or decrease I will use the “predict” command to do so and “abline” to support
my prediction.
In the above graph the blue line shows that the Midweek traffic is going to increase in
a gradual way than the Weekends. The coefficient is NULL and that’s why the straight
line formed from 0.
As I will be utilising the linear regression model on the dataset, the next step is to pro-
duce the linear model. I will apply the linear model on the variable Days against Mid-
weekADT. The following code shows the linear model application on the variables:
“summary(lm1)”
Next I will produce an analysis of variance table (ANOVA) to assess the importance of
the factors by comparing the response variable means at the different factor levels. The
following code will produce the ANOVA and the screenshot follows:
“anova(lm1)”
Now that the analysis of variance table has been produced, the next step is to predict
the linear model fits based on the object. It will produce the predicted values that have
been obtained by evaluating the linear regression function. The following code will
predict the linear model and the following screenshot shows the fit value, lower value
and the upper value:
“predict(lm1, interval="predict")”
Now that the prediction has been accomplished, the next and final step is to plot the
predicted values that have been obtained by evaluating the linear regression function.
The following code will plot the predicted values and the following screenshot shows
the graph of prediction:
“plot(predict(lm1), residual=(lm1))”
After plotting the predictions in the graph, I will now use the command “abline” to
show the relationship of the graph. I will be using the code:
Now, per this graph our prediction says that the number of Days in the Mid-week tends
to have less traffic than other days and it’s going to decrease more in recent years. I am
going to that judgement after plotting the “abline” in the graph. As, we can see that the
relationship shows a negative gradient which indicates a negative relationship. After
plotting the graph, I will now test the correlation of the curve. Correlation helps us to
identify how well the graph is being fitted and what will the relation be which will
readily help us to identify the accuracy of the result. To test the correlation, I will now
use the command shown below:
“cor.test(Days,MidweekADT)”
The result has been delivered and we can see that the correlation is -0.1635854. My
plotted results have a confidence interval of 95% which means that the interval has a
0.95 probability of containing the population mean (Plotts, 2011).
We can also find the correlation by using the command “cor(Days,MidweekADT,
method="pearson")”. This is also known as pearson’s method to find the correlation.
Now, the covariance of two factors x and y in an information set measures how the two
are directly related. A positive covariance would show a positive direct relationship
between the factors, and a negative covariance would demonstrate the inverse. To get
the covariance I will use the command stated below:
“cov(Days,MidweekADT)”
I will use the “coef(lm1)” command to determine the coefficient and the y intercept of
the graph. My coefficient came 7.663013e+00. We can conclude that the expected re-
sult is true because out plotted graph have shown us that the correlation line started
from that point.
Now, I will use the command to plot four more graphs to conclude my predictions. For
doing that I need to use layout command to plot them in one frame. The command for
the linear model to put into the frame is shown below:
“layout(matrix(1:4,2,2))”
“plot(lm1)”
The graph has been plotted using the “plot()” function and it has been organised by the
“layout()” function. Below represents the graph.
8 Conclusion:
As per to the research I have tried to show that how the rate of the traffic can be lessened
in Midweeks per Days. Now I will discuss two different graphs which will represent of
what type of incidents are frequently taking place and the other one will show in which
street the activity is recently taking places more.
The above graph represents the most incident that took place per the incident code is 1
and the least of the incident codes is 607.
This are the four street numbers of where the activity has been recorded. As per the
graph suggest we can say that street ”X” has the most activity of around >800 and
streets “24X” and “643GX” has been supported with equal data of less than 100 activ-
ity.
References