You are on page 1of 13

Forest Fires

By: Kaitlyn Enger & Rachael Schueller


Paper
“A Data Mining Approach to Predict Forest Fires using
Meteorological Data”
By: Paulo Cortez and Anibal Morais
About the Paper
Prediction: Using recent real-world data, collected from the northeast region
of Portugal, aiming to predict the burned area (or size) of forest fires.

Algorithms Used: Five DM techniques (Multiple Regression, Decision Tree,


Random Forest, Neural Net and Support Vector Machine),

Four distinct feature selection setups : - STFWI – using spatial, temporal


and the four FWI components; STM – with the spatial, temporal and four
weather variables; FWI – using only the four FWI components; and M – with
the four weather conditions
Algorithms We Used
● Neural Network
○ R package - nnet
● Support Vector Machine
○ R package - e1071
● Multiple Regression
Original Dataset
Neural Network - R Code
library(gmodels)
library(nnet)
mydata <- read.csv("forestfires.csv")
mydata$isfire <- ifelse(mydata$area > 2, "large", "small") - Changed area to be a factor
mydata$isfire <- as.factor(mydata$isfire) * Less than 2 ha is considered small.
mydata$area <- NULL One hectare contains about 2.47 acres.
mydatarandomized <- mydata[sample(nrow(mydata)),]
trainingdata <- mydatarandomized[1:400,]
testingdata <- mydatarandomized[401:517,]
myn <- nnet(isfire~., data = trainingdata, size = 10, MaxNWts = 100000) - One of the best sizes for a higher

accuracy
mp <- predict(myn, testingdata[,1:12], type = "class")
t <- CrossTable(mp, testingdata[,13])
t$t
sum(diag(t$t))/sum(t$t)
Neural Network - Results
● Accuracy: 0.5897436

● Although the accuracy is decently high compared to other accuracy we


were generating, this algorithm has a difficult time predicting large forest
fires.
Support Vector Machine - R Code
● Still used factors, small and large, instead of total area

SVM in R code:
library(e1071)
mysvm <- svm(isfire~., data = trainingdata)
mypredict <- predict(mysvm, testingdata[,1:12])
t <- CrossTable(mypredict, testingdata[,13])
t$t
sum(diag(t$t))/sum(t$t)
Support Vector Machine - Results
● Accuracy: 0.6239316

● This similar result to neural network.


● Although SVM has a higher accuracy than neural network, this algorithm
has a difficult time predicting large forest fires as well.
Multiple Regression - R Code
● Did not use factor, used predicted area

R code:
mydata <- read.csv(file = "forestfires.csv", header = T, stringsAsFactor = T)
mymlr <- lm(area ~., data = mydata)
summary(mymlr)
Multiple Regression - Result
● The most significant variables are DMC
(Duff Moisture Code) and DC (Drought
Code) to predict area.
● If DMC is increased by one unit, then the
area of fire will be increased by .2 hectare.
● If DC is increased by one unit, then the area
covered by fire will be decreased by .128
hectare.
● The R-squared value is very low, and
therefore, won’t be that significant
Paper Analysis
Experiment Results: All experiments reported in this study were conducted
using the RMiner. The NN and SVM methods, all attributes were standardized
to a zero mean and one standard deviation. They found that it was difficult to
predict the large fires. To improve the ability to predict large forest fires, the
researcher thought such that having type of vegetation and firefighting
intervention could have a potential to improve this study.
Conclusion
● Neural Network and Support Vector Machine: It is difficult to
predict large fires
● Multiple Regression: Out of all the variables that go into
predicting the burnt area, the DMC and DC are the most
significant variables