R Code NB

Uploaded by

brahmesh_sm

0% found this document useful (0 votes)

56 views3 pages

R code for Navie Bayes

Original Title

R code NB

Copyright

Available Formats

DOCX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

R code for Navie Bayes

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

56 views3 pages

R Code NB

Uploaded by

brahmesh_sm

R code for Navie Bayes

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 3

Search inside document

# Text

classification
using a Naive
Bayes scheme
# Data : 20 Newsgroups
# Download link : http://www.cs.umb.edu/~smimarog/textmining/datasets/

# Load all the required libraries. Note : Packages need to be installed first.
library(dplyr)
library(caret)
library(tm)
library(RTextTools)
library(doMC)
library(e1071)
registerDoMC(cores=detectCores())
# Load data.
# We will use the 'train-all-terms' file which contains over 11300 messages.
# Read file as a dataframe
ng.df <- read.table("20ng-train-all-terms.txt", header=FALSE, sep="\t", quote="",
stringsAsFactors=FALSE, col.names = c("topic", "text"))

# Preview the dataframe

# head(ng.df) # or use View(ng.df)
# How many messages do each of the 20 categories contain?
table(ng.df$topic)
# Read topic variable as a factor variable
ng.df$topic <- as.factor(ng.df$topic)

# Randomize : Shuffle rows randomly.

set.seed(2016)
ng.df <- ng.df[sample(nrow(ng.df)), ]
ng.df <- ng.df[sample(nrow(ng.df)), ]
# Create corpus of the entire text
corpus <- Corpus(VectorSource(ng.df$text))

# Total size of the corpus

length(corpus)

# Inspect the corpus

inspect(corpus[1:5])
# Tidy up the corpus using 'tm_map' function. Make the following transformations on
the corpus : change to lower case, removing numbers,
# punctuation and white space. We also eliminate common english stop words like
"his", "our", "hadn't", couldn't", etc using the
# stopwords() function.
# Use 'dplyr' package's excellent pipe utility to do this neatly
corpus.clean <- corpus %>%
tm_map(content_transformer(tolower)) %>%
tm_map(removePunctuation) %>%
tm_map(removeNumbers) %>%
tm_map(removeWords, stopwords(kind="en")) %>%
tm_map(stripWhitespace)
# Create document term matrix
dtm <- DocumentTermMatrix(corpus.clean)
dim(dtm)
# Create a 75:25 data partition. Note : 5000 (~50% of the entire set) messages were
used for this analysis.

ng.df.train <- ng.df[1:8470,]

ng.df.test <- ng.df[8471:11293,]

dtm.train <- dtm[1:8470,]

dtm.test <- dtm[8471:11293,]
dim(dtm.test)
corpus.train <- corpus.clean[1:8470]
corpus.test <- corpus.clean[8471:11293]
# Find frequent words which appear five times or more

fivefreq <- findFreqTerms(dtm.train, 5)

length(fivefreq)
dim(dtm.train)
# Build dtm using fivefreq words only. Reduce number of features to
length(fivefreq)
system.time( dtm.train.five <- DocumentTermMatrix(corpus.train, control =
list(dictionary=fivefreq)) )
system.time( dtm.test.five <- DocumentTermMatrix(corpus.test, control =
list(dictionary=fivefreq)) )
# converting word counts (0 or more) to presence or absense (yes or no) for each
word
convert_count <- function(x) {
y <- ifelse(x > 0, 1,0)
y <- factor(y, levels=c(0,1), labels=c("No", "Yes"))
y
}
# Apply yes/no function to get final training and testing dtms
system.time( ng.train <- apply(dtm.train.five, 2, convert_count) )
system.time ( ng.test <- apply(dtm.test.five, 2, convert_count) )
# Build the NB classifier
system.time (ngclassifier <- naiveBayes(ng.train, ng.df.train$topic))

# Make predictions on the test set

system.time( predictions <- predict(ngclassifier, newdata=ng.test) )
predictions
cm <- confusionMatrix(predictions, ng.df.test$topic )
cm

The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
From Everand
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Spam Class
Document21 pages
Spam Class
paridhi kaushik
No ratings yet
Spam Classification2
Document21 pages
Spam Classification2
paridhi kaushik
No ratings yet
Cheat Sheet: Extract Features
Document2 pages
Cheat Sheet: Extract Features
Gerald
No ratings yet
Text Mining KNN
Document2 pages
Text Mining KNN
vedavarshni
No ratings yet
Quanteda
Document2 pages
Quanteda
Data Scientist
No ratings yet
R语言基础入门指令 (tips)
Document14 pages
R语言基础入门指令 (tips)
s2000152
No ratings yet
Glove
Document10 pages
Glove
tareqeee15
100% (1)
Text Mining Package and Datacleaning: #Cleaning The Text or Text Transformation
Document6 pages
Text Mining Package and Datacleaning: #Cleaning The Text or Text Transformation
Arush sambyal
No ratings yet
Ass
Document5 pages
Ass
Taqwa Elsayed
No ratings yet
Unit6 - Working With Data
Document29 pages
Unit6 - Working With Data
vvloggingzone05
No ratings yet
Python Data Import
Document28 pages
Python Data Import
Beni Djohan
100% (1)
Quanteda PDF
Document2 pages
Quanteda PDF
ayrusurya
No ratings yet
Ai - Phase 3
Document9 pages
Ai - Phase 3
Manikandan N
No ratings yet
MalenoV Code 5 Layer CNN 65x65x65 Voxels
Document30 pages
MalenoV Code 5 Layer CNN 65x65x65 Voxels
Alejandro Garza Juárez
No ratings yet
Writing Efficient R Code
Document5 pages
Writing Efficient R Code
Octavio Flores
No ratings yet
21blc1084 Edalab11
Document4 pages
21blc1084 Edalab11
vishaldev.hota2021
No ratings yet
Tensorflow Neural Network Lab: Notmnist
Document15 pages
Tensorflow Neural Network Lab: Notmnist
Daniel Petrov
No ratings yet
Lab Digital Assignment 6 Data Visualization: Name: Samar Abbas Naqvi Registration Number: 19BCE0456
Document11 pages
Lab Digital Assignment 6 Data Visualization: Name: Samar Abbas Naqvi Registration Number: 19BCE0456
SAMAR ABBAS NAQVI 19BCE0456
No ratings yet
Dinosaurus Island - Character-Level Language Model - (Final) - Learners - Ipynb
Document10 pages
Dinosaurus Island - Character-Level Language Model - (Final) - Learners - Ipynb
EMBA IITKGP
No ratings yet
ASTW RA03 PracticalManual
Document18 pages
ASTW RA03 PracticalManual
Diksha Nasa
No ratings yet
Ejercicio #1
Document3 pages
Ejercicio #1
Michelle Galarza S.
No ratings yet
NLP - Practical List
Document14 pages
NLP - Practical List
Yash Amin
No ratings yet
9 20bec1318
Document7 pages
9 20bec1318
Christina Cecilia
No ratings yet
List Tuple
Document9 pages
List Tuple
Schwarzenegger Siva
No ratings yet
Text Mining Code
Document3 pages
Text Mining Code
yashsethea
No ratings yet
Ass 8
Document2 pages
Ass 8
Taqwa Elsayed
No ratings yet
Assignment
Document8 pages
Assignment
LearnITPrograms Solutions
No ratings yet
Machine Learning Theory (CS351) Report Text Classification Using TF-MONO Weighting Scheme
Document16 pages
Machine Learning Theory (CS351) Report Text Classification Using TF-MONO Weighting Scheme
Ameya Deshpande.
No ratings yet
p3 Python Project
Document4 pages
p3 Python Project
Daniella Vargas
No ratings yet
Order Tasks and Milestones Assignment
Document6 pages
Order Tasks and Milestones Assignment
saqib khattak
No ratings yet
R Homework
Document13 pages
R Homework
Testa Mesta
No ratings yet
Machine Learning
Document54 pages
Machine Learning
Jacob
No ratings yet
Simple Tutorial in R
Document15 pages
Simple Tutorial in R
klugshitter
No ratings yet
Bradzil Classif withTM
Document16 pages
Bradzil Classif withTM
Sri Krishna Gurazada
No ratings yet
Pattern Recognition
Document26 pages
Pattern Recognition
Aryan Attri
No ratings yet
Advanced Linux: Exercises: 0 Download and Unpack The Exercise Files (Do That First Time Only)
Document6 pages
Advanced Linux: Exercises: 0 Download and Unpack The Exercise Files (Do That First Time Only)
Niran Spirit
No ratings yet
12 Cs Cbse QP Programs
Document10 pages
12 Cs Cbse QP Programs
royalfancy704
No ratings yet
Datasets
Document40 pages
Datasets
Asmatullah Khan
No ratings yet
RDataMining Slides Text Mining
Document34 pages
RDataMining Slides Text Mining
Sukhendra Singh
No ratings yet
Aped For Fake News
Document6 pages
Aped For Fake News
Bless Co
No ratings yet
Step 1: Create A CSV File: # For Text Mining
Document9 pages
Step 1: Create A CSV File: # For Text Mining
deeksha
No ratings yet
DA0101EN-Review-Introduction - Jupyter Notebook
Document8 pages
DA0101EN-Review-Introduction - Jupyter Notebook
Sohail Doulah
No ratings yet
Kernel: It Is The Core of The UNIX Operating System. It Allocates The Time and Memory To
Document8 pages
Kernel: It Is The Core of The UNIX Operating System. It Allocates The Time and Memory To
Archana Somarapu
No ratings yet
Untitled
Document59 pages
Untitled
Sylvin Gopay
No ratings yet
Code2pdf 64692611922ce
Document2 pages
Code2pdf 64692611922ce
bery mansor osman
No ratings yet
Operating Systems
Document49 pages
Operating Systems
Rubal Sharma
No ratings yet
Lab 6
Document7 pages
Lab 6
Kiyanoosh Rahravan
No ratings yet
Part 6
Document11 pages
Part 6
Naji Saleh
No ratings yet
Econ589multivariateGarch R
Document4 pages
Econ589multivariateGarch R
JasonClark
No ratings yet
CS8461 Os Lab Manual Print
Document58 pages
CS8461 Os Lab Manual Print
kurinji
67% (3)
Murenei - Natural Language Processing With Python and NLTK
Document2 pages
Murenei - Natural Language Processing With Python and NLTK
Darlyn LC
No ratings yet
FAL (2022-23) CSE1006 ELA AP2022232001101 Reference Material I 06-Oct-2022 R EX 8
Document4 pages
FAL (2022-23) CSE1006 ELA AP2022232001101 Reference Material I 06-Oct-2022 R EX 8
Freeguy Freeguy
No ratings yet
Session 5: A C Program For Straight Line Fitting To Data: 1st Year Computing For Engineering
Document11 pages
Session 5: A C Program For Straight Line Fitting To Data: 1st Year Computing For Engineering
kvgpraveen107
No ratings yet
Ensayo Abrotanella: Cargar Un Arbol Filogenetico
Document12 pages
Ensayo Abrotanella: Cargar Un Arbol Filogenetico
vshalisko
No ratings yet
Julia Basic Commands
Document10 pages
Julia Basic Commands
Lazarus Pitt
No ratings yet
Linux Commands
Document33 pages
Linux Commands
ssdasd s
No ratings yet
R Manual
Document10 pages
R Manual
Superset Notifications
No ratings yet
#1 - Skill Builds - Data Analysis With Python
Document3 pages
#1 - Skill Builds - Data Analysis With Python
Gregory
No ratings yet
Lesson 6 Recap
Document6 pages
Lesson 6 Recap
mca1230
No ratings yet
LP - SCM 2019 7 Sem
Document2 pages
LP - SCM 2019 7 Sem
brahmesh_sm
No ratings yet
SCM
Document2 pages
SCM
brahmesh_sm
No ratings yet
C# (P17is553) Model QP
Document2 pages
C# (P17is553) Model QP
brahmesh_sm
No ratings yet
Code Optimization
Document90 pages
Code Optimization
brahmesh_sm
0% (1)
Assignment 8
Document10 pages
Assignment 8
brahmesh_sm
No ratings yet
C# & .NET Lab
Document1 page
C# & .NET Lab
brahmesh_sm
No ratings yet
Android - A Beginner's Guide: Setup Eclipse and The Android SDK
Document8 pages
Android - A Beginner's Guide: Setup Eclipse and The Android SDK
brahmesh_sm
No ratings yet
R Code NB
Document3 pages
R Code NB
brahmesh_sm
No ratings yet
Java Development Tools
Document1 page
Java Development Tools
brahmesh_sm
No ratings yet
Java Development Tools
Document1 page
Java Development Tools
brahmesh_sm
No ratings yet
What Is Virtual Company
Document1 page
What Is Virtual Company
brahmesh_sm
No ratings yet
C# Language
Document81 pages
C# Language
brahmesh_sm
No ratings yet
C# & .NET Lab
Document1 page
C# & .NET Lab
brahmesh_sm
No ratings yet
Understanding SOAP
Document48 pages
Understanding SOAP
api-3773703
No ratings yet
Chapter 1: Language Fundamentals (Next
Document6 pages
Chapter 1: Language Fundamentals (Next
brahmesh_sm
No ratings yet
Java Loop Control
Document5 pages
Java Loop Control
mjrkmail
No ratings yet
Camera API
Document5 pages
Camera API
brahmesh_sm
No ratings yet
Emulator PDF
Document20 pages
Emulator PDF
brahmesh_sm
No ratings yet
4 PG
Document4 pages
4 PG
brahmesh_sm
No ratings yet
Face Detection Evaluation
Document11 pages
Face Detection Evaluation
brahmesh_sm
No ratings yet
Quicksort
Document67 pages
Quicksort
wifler
No ratings yet
What Is Software Testing
Document12 pages
What Is Software Testing
Pragati Sharma
No ratings yet
Understanding SOAP
Document48 pages
Understanding SOAP
api-3773703
No ratings yet
Att Ed
Document2 pages
Att Ed
brahmesh_sm
No ratings yet
RPM Counter
Document4 pages
RPM Counter
mailtoakhils
No ratings yet
Mniproject Report.
Document22 pages
Mniproject Report.
add Romit
No ratings yet
Vocabulary Acquisition Paul Nation 1989
Document139 pages
Vocabulary Acquisition Paul Nation 1989
juanhernandezloaiza
No ratings yet
Suffix List
Document4 pages
Suffix List
Sean
100% (5)
SDLC Assignment 1 BKC18400
Document24 pages
SDLC Assignment 1 BKC18400
HinHin
67% (3)
Influence of Atmospheric Stability On The Spatial Structure of Turbulence
Document109 pages
Influence of Atmospheric Stability On The Spatial Structure of Turbulence
Klaas Krona
No ratings yet
Chapter-09-Directory of Officers and Employees PDF
Document60 pages
Chapter-09-Directory of Officers and Employees PDF
Govardhan
No ratings yet
Emblema in Alciati Miedema
Document18 pages
Emblema in Alciati Miedema
Bia Eleonora
No ratings yet
Lesson Plan Braille 2520fluency3 Standard 7
Document2 pages
Lesson Plan Braille 2520fluency3 Standard 7
api-315690330
No ratings yet
Practical Programming 2017
Document277 pages
Practical Programming 2017
Kailash Sharma
No ratings yet
General English Model Question Paper II Sem 2023-24
Document3 pages
General English Model Question Paper II Sem 2023-24
Ss Xerox
No ratings yet
The Story of A Lazy Man
Document2 pages
The Story of A Lazy Man
Calin Bulin
No ratings yet
The Gospel of Repentance and Faith
Document2 pages
The Gospel of Repentance and Faith
Dereck Lee
No ratings yet
Example Learner Profile Chart-1
Document3 pages
Example Learner Profile Chart-1
api-644764368
No ratings yet
دور المستشرقين الفرنسيين في نقل الثقافة العربية إلى الغرب PDF
Document26 pages
دور المستشرقين الفرنسيين في نقل الثقافة العربية إلى الغرب PDF
mouadh hamdi
100% (2)
Notes On Data Structures and Programming Techniques (CPSC 223, Spring 2021)
Document659 pages
Notes On Data Structures and Programming Techniques (CPSC 223, Spring 2021)
KALKA DUBEY
No ratings yet
Blueback Language Activities
Document1 page
Blueback Language Activities
Zola Siegel
No ratings yet
Quick Guide To GREP Codes in Adobe InDesign-new
Document26 pages
Quick Guide To GREP Codes in Adobe InDesign-new
Mark Nagash
No ratings yet
"Without Contraries, There Is No Progression": An Analysis On Blake's
Document6 pages
"Without Contraries, There Is No Progression": An Analysis On Blake's
Sara Lupón
No ratings yet
اللسانيات بين النظرية والتطبيقية -تصور شامل للمفاهيم والعلاقات
Document15 pages
اللسانيات بين النظرية والتطبيقية -تصور شامل للمفاهيم والعلاقات
salahouchenesalah
No ratings yet
Communicative Approach
Document23 pages
Communicative Approach
pramukh_swami
100% (3)
16 RMM Spring Edition 2020 Solutions Compressed
Document83 pages
16 RMM Spring Edition 2020 Solutions Compressed
Khokon Gayen
No ratings yet
Errores
Document6 pages
Errores
Bernard Surita Placencia
No ratings yet
Batch Management
Document18 pages
Batch Management
Darsh Rathod
No ratings yet
Vaughan Review Magazine - January 2007 PDF
Document32 pages
Vaughan Review Magazine - January 2007 PDF
Mauro Navarro
No ratings yet
8085 Interrupts
Document8 pages
8085 Interrupts
Charles Samuel
No ratings yet
Zand Pahlavi Glossary
Document204 pages
Zand Pahlavi Glossary
zkassock
No ratings yet
Standard Operating Procedure SDLC
Document9 pages
Standard Operating Procedure SDLC
Khedidja Ouheb
No ratings yet
Grade6 Las Music Week7&8 Quarter2
Document8 pages
Grade6 Las Music Week7&8 Quarter2
Jerusalem Cuarteron
No ratings yet
Call Instruction
Document7 pages
Call Instruction
mostafa hasan
No ratings yet