You are on page 1of 36

STATISTICS WITH R PROGRAMMING

(R-16 regulation-JNTU Kakinada)

II YEAR – I SEMISTER

Unit-1 notes
Introduction to R language

Prepared by
S.S.R.K.M.GUPTA. M.Tech.,( Ph.D.), M.C.S.I.
Assistant Professor, CSE Department,
Aditya College of Engineering & Technology,
Surampalem.
STATISTICS WITH R PROGRAMMING
OBJECTIVE:
After taking the course, students will be able to
• Use R for statistical programming, computation, graphics, and modeling,
• Write functions and use R in an efficient way,
• Fit some basic types of statistical models
• Use R in their own research,
• Be able to expand their knowledge of R on their own.

BRIEF SYLLABUS
• UNIT-I: Introduction
• UNIT-II: R Programming Structures
• UNIT-III: Doing Math and Simulation in R
• UNIT-IV: Graphics
• UNIT-V: Probability and Basic Statistics
• UNIT-VI: Advanced Statistical Tools

OUTCOMES:
At the end of this course, students will be able to:
• List motivation for learning a programming language
• Access online resources for R and import new function packages into the R workspace
• Import, review, manipulate and summarize data-sets in R
• Explore data-sets to create testable hypotheses and identify appropriate statistical tests
• Perform appropriate statistical tests using R, Create and edit visualizations with R

TEXT BOOKS:
1) The Art of R Programming, A K Verma, Cengage Learning.
2) R for Everyone, Lander, Pearson.
3) The Art of R Programming, Norman Matloff, No starch Press.

REFERENCE BOOKS:
1) R Cookbook, PaulTeetor, Oreilly.
2) R in Action, Rob Kabacoff, Manning

UNIT-1 - TOPICS

• Introduction
• How to run R
• R Sessions and Functions
• R basics : Basic Math, Variables, Data Types.
• Advanced Data Structures : Vectors , Data Frames, Lists, Matrices, Arrays, Classes.
Introduction
What is statistics?
• Statisitics is the science of collecting, organizing, presenting, analyzing, and interpreting data to
assist in making more effective decisions.
• Statistical analysis is used to manipulate, summarize, and investigate data, so that useful for
decision-making from the information results.
• Types of statistics :
– Descriptive statistics – Methods of organizing, summarizing, and presenting data in an
informative way, includes Measures of central tendency like a) mean b) median c) mode,
Measure of variability like a) range b) deviation c) variance d) standard deviation.
– Inferential statistics – The methods used to determine something about a population on the
basis of a sample. Inference is the process of drawing conclusions or making decisions
about a population based on sample results ex: Estimation, Hypothesis testing etc.,

What is R ?
• R is a scripting language for statistical data manipulation and analysis.
• It supports statistical computing and graphics to analyze data, and making decisions.
• It is also has a large and highly flexible collection of graphing facilities for data display.
• “S” is a language that was developed by John Chambers in 1976, as an internal statistical analysis.
• “S” later was added with GUI interface and named as “S-PLUS”.
• “R” language referred as “GNU package of S”
• “R” was created by “Ross Ihaka” and “Robert Gentleman”, at university of Auckland, New Zealand in
1993.
• “R” is named with the first letters of two “R” authors, which is named with the influence of “S”-
language .
• History and milestones of R:
– 1976 - “S” language was invented.
– 1983 – Version S3 is released with OOPs paradigm.
– 1988 - S-PLUS is first produced.
– 1993 - “R” was created by “Ross Ihaka” and “Robert Gentleman”.
– 1995 – GNU general public license is used
– 1997 – R core group is formed
– 2000 – version 1.0.0 is released
– 2014 – version 3.1.2 is released
– 2017 – version 3.4.0 is released

What are the programming Features of R?


• R is an interpreter language.
• It has syntax and semantics similar to S language.
• We cam run R on any platform like windows, Unix, Mac etc.,
• All functionalities are modularized in packages.
• It is case sensitive and commands are separated by ‘;’ or a new line.
• It is open source language, and huge community of scientists are supporting.
• R language serves as glue language, best for statistical, data analysis and machine learning.
• Like other languages, it includes the features like database input, export data, viewing data,
variable labels, missing data etc.,
• It also supports matrix arithmetic like MatLab, can communicate with C and C++, including with
OOPs features.
• R language is powerful programming language to develop new tools.

What are the strengths and advantages of R language?


• “R” is a general purpose programming language.
• It has a comprehensive set of various statistical analysis techniques as referred to:
– Classical statistical tests
– Linear and non linear modeling
– Time-series analysis
– Classification and cluster analysis
– Spatial and Bayesian statistics etc.,
• Every statistical technique is either already built into R, or available as a free package.
• Completely open-source and free.
• High quality graphics can be generated .
• It is available for windows , mac and linux operating systems.
• it incorporates features in object oriented and functional programming languages.

What are the limitations of R language?


• It is 40 years old technology, lack of efficiency, low speed, and poor memory management.
• Little built-in support for dynamic/3D graphics.
• Functionality is based on consumer demand.
• Objects are stored in physical memory.
• It is not a database but connects to DBMS.
• Language interpreter can be slow but allows to call C or C++ code.
• No spreadsheet view of data, but connects to MS Excel / MS office.

How to Run R
Installation:
How to install R and R studio in different environments?
• Open url: https:/crane.r-project.org
• Download the precompiled binary distributions of the base system, form the links.
– Download R for Linux
– Download R for (Mac) OSX
– Download R for Windows
• Linux:
a) Ubuntu:
– >sudo apt-get update
– >sudo apt-get install r-base
– >sudo apt-get install r-base-dev
b) Redhat fedora:
– >sudo yum install R
– For R packages
– > yum list R-\*
– It lists all RPMs for additional packages
c) Debian:
– > apt-get install update
– > apt-get install r-base r-base-dev
• (Mac)OS:
– Download the package file for R 3.4.0.pkg
– Double click on it and it will open the installer
• Windows:
– Select the sub directory: base (click on it)
– Click on the link, download R 3.4.0-win.exe
– Install it as per the directions given by it.
• Some popular IDEs for R-Language:
– Rstudio
– Tinn-R
– Deducer
– Revolution R
– Text Editors: Vim, Eclipse+stat ET
• Installing R Studio: Download the latest version of RStudio just by clicking on the link provided here:
https://www.rstudio. com/products/rstudio/download/
Running R :
Explain the two modes to run R from the R- IDE.
We can run r environment in two modes.
a) Running R in Interactive mode
• Open the shortcut R 64 3.4.0
• It opens the command window with the prompt ‘>’
• You can execute R commands
– e.g.
– >print(“Welcome to R”)
– [1] “Welcome to R”
• You can also run the .r file
– >source(“sample.r”) and press enter
b) Running R in Batch mode
• Sometimes it is preferable to automate the process of running R
• We could automatically run the r script by simply typing
• R CMD BATCH – venilla < [input file] > [output file].
• Ex: R CMD BATCH – venilla < sample.r >result.txt
• The -venilla option tells R not to load up any startup file information, and not to save anything.

R Sessions and Functions:


Write about commands useful in session of R environment.
R Session:
• A session is a series of interactions between the user and the environment, that occur during
the span of a continuous period.
• We can start the session by double clicking the R-icon, with opening the command window.
• After opening the shell command window, it is ready to accept the user’s instruction, can be
entered at the command prompt >
• We can directly execute R language commands by typing them line by line.
• After pressing the enter key, it terminates the present command and starts execution.
• After processing the command it displays the output, if there, and displays the > prompt again.

Example Session:

Exit the R Session with saving or without saving :


• To exit from R session type quit() or q() in the R prompt.

• When we work in R, the R objects are created and loaded are stored in memory position called
workspace.
• When we say no to save the workspace, we all will lose it. Objects are wiped out from the
workspace.
• If we say ‘yes”, they are saved into a file called “.RData” is written to the present working
directory.
• When we start R in the same current directory next time the workspace and all the created objects are
restore automatically from the .RData file.
Listing the objects:
• ls() function is used to list objects in the workspace.

Removing the objects from the current Session:


• rm() function is used to remove data object from the work space.

Getting and setting current working directory:

• getwd() function to display the current directory.


• setwd() function to change the current working directory.

Getting file information from R session:


• When we are inside R prompt, the OS system commands will not be recognized by R.
• If we want to list the name of the files in the current working directory, use
• >list.files()
• To know the information of a specific file
• >list.info(“filename”)
Some Commands useful in R session:
• -example(“topic”) – to know the example if the topic
o e.g.: example(“if”)
• ctrl+L to clear the command window screen
• dir() – to display all the files in the current directory
• library() – to list out packages installed in the system
• help.start() – to start the HTML version of help
• help.search(“topic”) – search the help system
• ?topic – is also used to get help for the topic
• ls.str() – displays all details of all objects in the current session
• # - comment

Functions:
• A function is a simple module of a program, which is called by its function name and it is
executed in the function body, when function is called by its name.
• We can pass some input to the function with a program list.

In the above code segment


• oddcount - is function name
• function - keyword to define function
• return() - keyword to return some data
• oddcount(c(1,3,5)) - function call
• Note that the arguments pass to R function are read-only (call by value)
Assigning values for global object from the function:
• <<- operator is used to assign value to the data object in the outer scope.
e.g.
> w <- 5 # creating an data object as global variable
>addone<-function() # definition of the function
+ w <<- w+1 # <<- operator is used to assign value for global variable in function body.
>addone() # calling the function
>w
[1] 7

Default arguments:
• R also makes frequent use of default arguments. In the (partial) function defination
• e.g. function(x,y=2)
• y will be initialized to 2 if the programmer doesn’t specify y in the function call

R Programming Tokens
i) Reserved Keywords:
If else repeat while function
for in next break TRUE
FALSE NULL Inf NaN NA
NA-integer NA-real NA-complex NA-character
ii) Identifiers:
Names of variables, methods, classes, etc
Rules:
• Identifiers can be a combination of letters, digits, period(.) and underscore (_)
• It must start with a letter or a period
• If it starts with a period, it can’t be followed by a digit
• Reserved keywords in R can’t be used as identifiers
• Identifiers are case sensitive and should not contain spaces
• Valid identifiers: total, sum, fine.with.dot, this_is_acceptable, numbner5
• Invalid Identifiers: tot@l, 5sum, _fine, TRUE, /ne
iii) Literals: constant values, which are normally assign to the variables.
• double – 0.3, 1.257, 12.0, .765, 12.75e+4
• integer – 10, 0xF2C
• logical – TRUE, FALSE, T, F
• complex – 3.5+4.2i
• character – ‘a’, “a”, ‘hello’, “hello”
Special values:
• NA – missing elements, Not Avaliable
• NaN – Not a number
• NULL – absence of object
iv) Operators:
Assignment operators:
<- left assignment, binary
-> Right assignment, binary
= left assignment but not recommended
<<- Left assignment in outer Lexical scope
Special operators:
$ list subset, binary
+ plus, can be unary or binary
~ used for model
: sequence, binary
:: refer to function in package
Arithmetic operators:
* multiplication, binary
- minus, can be unary or binary
/ division, binary
^ exponentiation, binary
%x% special binary operators, x can be replaced
%% modulus, binary
%/% integer division
%O% outer product, binary
%*% matrix product
%in% matching operator, binary
Logical operators:
!x logical negation
x&y logical and, element wise
x&&y vector logical And
x|y Logical OR, element wise
x||y Vector logical OR
x or (x,y) element wise execution OR
Relational operators:
< Less than
<= Less than or equal to
> greater than
>= greater than or equal to
== equal to
!= not equal to

Precedence of Operators in R language:


• Operator precedence describes the order in which R reads expressions.
• Operators higher in the chart have a higher precedence, meaning that the R interpreter
evaluates them first.
• Operators on the same line in the chart have the same precedence, and the “Associativity” is
taken in consideration for the evaluation order.
• The below table describes the levels of operators from highest to lowest.
Operator Meaning
( { Function calls, grouping expressions
[ [[ indexing
:: ::: Access variables in a namespace
$ @ Component extraction and slot extraction
^ Exponentiation (L->R)
+ - Unary plus, Unary minus
: Sequence creation
% any % Special operators
. / Multiplication, division
+ - Addition, subtraction
== != < > >= <= Comparison
! Logical negation
& && Logical and, short circuit and
| || Logical OR, short circuit OR
~ Formula
-> ->> Rightward assignment
= Assignment(right to left)
<- <<- Leftward assignment (right to left)
? Help

Basic Math:
Once you have the R environment setup, then its easy to start your R command prompt by just typing
>R press enter
OR
click shortcut R-64 on the desktop
In the console, you can do some basic math operations in it as a calculator
>2+3
[1] 5
>3*6
[1] 18
>“Hello welcome to R”
[1] “Hello welcome to R”
Declaring Variables:
>age<-20
>print(age)
[1] 20
>age
[1] 20
>name <- “Hari Krishna”
>name
[1] “Hari Krishna”
Printing Output:
>age<- 25
>name<- “Ramesh”
>print(paste(“My name is “,name))
[1] “My name is Ramesh”
>print(paste(“My age is “, age))
[1] “My age is 25”
>cat(“My name is “,name,” and my age is “,age,’\n”)
[1] My name is Ramesh and my age is 25

Reading input from command prompt:


>name<-readline(“Enter your name: “)
Enter your name: Prasad
>age<- as.integer(readline(“Enter your age: “))
Enter your age: 32
>print(paste(“My name is “,name))
[1] “My name is Prasad”
>print(paste(“My age is “,age))
[1] “My age is 32”

Creating .R script:
File -> New Script -> opens a new editor
• Enter the below program:
name<-readline(“Enter your name: “)
age<-readline(“Enter your age: “)
print(paste(“My name is “,name))
print(paste(“My age is “,age))
• Now save the file as first.r
Running the .R script:
File->source R code->
• Select the file name as first.r
Or
• Type at the R command prompt
>source(“~\\first.r”)

R – Data Types:
• While doing programming in any programming language, you need to use various variables and
store various information.
• Variables are reserved memory locations to store values.
• You may like to store information of variables data types like character, string, integer, floating
Point, Boolean etc.
• In ‘C’ language a variable is declared with a particular data type [like int, double, float, char] and
the particular variable can store the same type of value in it, till the scope of variable the ends.
• But in R a variable is not declared of any data type, rather it gets the data type of the R-object
or literal assigned to it.
• So R is called a Dynamically Typed Language which means we can change a variable data type of
the same variable again and again when using it in a program.
Example:

Output:

Removing the variable:


>rm(var1)
Variable Assignment:
<- left assign operator
-> right assign operator
= left assign operator, alternative of <-
<<- global scope variable
assign(“var1”,value)
e.g.

Atomic(Basic) Data types in R:


• In R programming the very basic data types are also objects
• We can create vector objects which can hold elements of different classes as shown below:

DATATYPES EXAMPLE CODE

v<-TRUE
1.LOGICAL TRUE,FALSE print(class(v))
[1] “logical”

v<-23.5
2.NUMERIC 12.3, 5, 999 print(class(v))
[1] “numeric”

v<-2L
3.INTEGER 2L,34L,0L print(class(v))
[1] “integer”

v<-2+5i
4.COMPLEX 3+2i print(class(v))
[1] “complex”

v<-“Welcome to R”
5.CHARACTER “a”, ‘a’, “hello‘’, ‘hello’ print(class(v))
[1] “character”

v<-charToRaw(“Hello”)
6.RAW “hello” is stored as print(class(v))
48 65 6c 6c 6f [1] “raw”
Type checking functions:
Checks the data type of variables and returns the TRUE/FALSE
• is.numeric(variable)
• is.double(variable)
• is.logical(variable)
• is.complex(variable)
• is.character(variable)
• is.raw(variable)
• is.integer(variable)

e.g.
>X<-10
>is.numeric(x)
[1] True
>is.double(x)
[1]True
>is.integer(x)
[1]False
>y<-25L
>is.integer(y)
[1]True

Type conversion functions:


• as.integer(var) -- converts to integer
• as.double(var) -- converts to double
• as.complex(var) -- converts to complex numbers
• as.character(var)-- converts to characters
• as.raw(var) -- converts to raw datatype
• as.logical(var) -- converts to logical datatype
>X<-17.5
>Is.double(x)
[1]True
>y<-as.integer(x)
>y
[1]17
>is.integer(y)
[1]True
>as.numeric(False)
[1]0
>as.numeric(True)
[1]1
DATA STRUCTURES:
• Data structure can be defined as the specific form of organizing and storing the data.
• R programming supports five basic types of data structure namely vector, matrix, list, data
frame and factor.
• We are going to discuss these data structures and the way to write these in R Programming.
• R’s base data structures can be organised by their dimensionality (1d, 2d, or nd) and whether
they’re homogeneous (all contents must be of the same type) or heterogeneous (the contents
can be of different types).
• This gives rise to the five data types most often used in data analysis:

• vectors(1-dimensional array), matrix (2-dimensional array) are variations of array data


structures.
• Arrays are restricted case of lists, which can store the elements of the same data type
(homogeneous).
• We can store different types of elements in data structures – ‘list’.
• Data frames are 2D data structures to store different types of elements like a tabular format
data, each column of data frame has same type of a value of a variable.
• Each row of data frame has related data of an entity which is information, contains of different
types of data items.
• Data frames contains no of rows and no of columns of each is fixed and same.
• Unlike data frames, lists are sparse and more no restriction.

Vectors:
• This is the most basic data structure.
• A contiguous sequence of data objects with a specific indexed order.
• Vector is called “Atomic” because all objects stored in it have the same type.
• We can create a new vector using the c() function which is short for “combine” or “concatenate”.
• Using the assignment operator “<-“we can assign an object and its values to a named variable.
• e.g.
We can also store a sequence of numbers in a vector.
X <- 11: 15
X
[1] 11 12 13 14 15
seq( ) function:- generates a sequence of numbers
> X <- seq (-6,2)
>X
[1] -6 -5 -4 -3 -2 -1 0 1 2
From -6 to 7 , step=2:-
> X <- seq (-6 , 7 , by=2)
>X
[1] -6 -4 -2 0 2 4 6
With a smaller step by 0.3 :-
> X <- seq (-2 , 2 , by=0.3)
>X
[1] -2.0 -1.7 -1.4 -1.1 -0.8 -0.5 -0.2 0.1 0.4 0.7 1.0 1.3 1.6 1.9
> X <- seq (-2 , 2 , length.out=9) # specific number of elements
>X
[1] -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
rep ( ) function :repeating the data
> X <- rep (1:5)
>X
[1] 1 2 3 4 5
> X <- rep (1:5 , 2)
>X
[1] 1 2 3 4 5 1 2 3 4 5
> X <- rep (1:5 , each=2)
>X
[1] 1 1 2 2 3 3 4 4 5 5
> X <- rep.int (1:5 , 2)
>X
[1] 1 2 3 4 5 1 2 3 4 5
sample( ) function :- generates random numbers between a range
> X <- sample(1:8)
>X
[1] 8 4 7 2 3 6 5 1
> X <- sample(1:8 , replace = TRUE)
>X
[1] 7 6 2 4 1 3 1 1
> X <- sample (10:25 , size=5)
>X
[1] 22 14 16 19 15
> X <- sample.int (20:30 , size =4)
>X
[1] 25 22 29 20

Creating Vectors of different types:-


> K <- c( 10L , 24L , 34L , -15L )
>K
[1] 10 24 34 -15
> is.double(K)
[1] FALSE
> is.integer (K)
[1] TRUE
> Student <- c ( "Ramesh" , "madhav" , "Laxmi" )
> Student
[1] "Ramesh" "madhav" "Laxmi"
> str (Student)
chr[1:3] "Ramesh" "madhav" "Laxmi"
> options <- c ( TRUE , FALSE ,TRUE ,FALSE )
> is.logical (options)
[1] TRUE
> Data <- c ( T , F ,T , F ,T )
> str (Data)
logi [ 1 : 5 ] TRUE FALSE TRUE FALSE TRUE

Scalar operations using vectors: ( one operand is scalar)


> X <- c (4 , 3 , 9 )
> X+3
[1] 7 6 12
>X-2
[1] 2 1 7
>X/2
[1] 2 1.5 4.5
>X %/% 2 # integer division
[1] 2 1 4
> X ^ 2 ( or ) X * * 2
[1] 16 9 81
> sqrt ( X )
[1] 2.000 1.732 3.000
>X*3
[1] 12 9 27
> X %% 2. # remainder or modulo operation
[1] 0 1 1

Vector operations : ( if both operands are vectors)


> a <- c( 2 , 7 , 12 )
> b <- c( 6 , 3 , -1 )
>a+b
[1] 8 10 11
>a - b
[1] -4 4 13
> a %% b
[1] 2 1 0
>a/b
[1] 0.33333 2.33333 -12.00000
>a*b
[1] 12 21 -12
> a %*% b Matrix multiplication
[ , 1]
[ 1 , ] 21
> length ( a ) # number of elements present in vector
[1] 3

Creating naming vector:


> arr1 <- c ( "low"=10 , " medium"=50 , " high "=90 )
> str ( arr1 )
Named num [ 1 : 3 ] 10 50 90
attr ( * , " name " ) = chr [ 1 : 3 ]. "low" " medium" " high "
Accessing elements :- ( Vector sub-setting )
> arr1 [ 1 ]
low
10
> arr1 [ " low " ]
low
10

Accessing elements from vectors ( Vector sub-setting )


> X <- c ( 3 , 7 , 9 , 4 , 2 , 8 , 5 , 6 , 1 )
> length ( X )
[1] 9
>X[2:5] # accessing 2, 3, 4, 5 elements
[1] 7 9 4 2
> X [ -1 ] # Eliminating the first element
[1] 7 9 4 2 8 5 6 1
>X[-c(2,3)] # Eliminating the second and third elements
[1] 3 4 2 8 5 6 1
> Y <- c ( 32 , 41 , 76 , 19 )
> S <- c ( F , T , F , T )
>Y[S]
[1] 41 19

A missing value in a vector:


• A missing values is one whose value is unknown
• Missing vales are represented by NA.
• NA is special value whose properties are different from other values
> var ( 8 )
[ 1 ] NA
> as.numeric ( c( "1" , "2" , "miree" , "3" ) )
[1] 1 2 NA 3
> X <- c ( 1 , 2 , 3 )
>X[4]
[1] NA
Operations on missing values:
> X <- c ( 1, 2 , NA , 4 )
>X
[1] 1 2 NA 4
> X+1 # NA+1=NA
[1] 2 3 NA 5
> sum ( X )
[1] NA
Detecting NAs :-
> is.na ( X )
[1] FALSE FALSE TRUE FALSE

Excluding NA values for:


>sum( X, na.rm= TRUE)
[1] 7

NULL - Absence of anything:


• It is not exactly missingness. It is nothingness.
• Functions can sometimes return NULL and their arguments can be NULL.
• An important difference between NA and NULL is that NULL is atomical and cannot exist within a
vector.
• If used inside a vector it simply disappears.
> z <- c(1,NULL,3)
>z
[1]1 3
> length(Z)
[1] 2
Even though it entered into the vector z, it will not store in z. In fact, z is only two elements long.
>d<-10
>is.null(d)
[1] FALSE
>d<-NULL
>is.NULL(d)
[1] TRUE

Coercion of vector elements:

• A vector can hold elements of same type only, and we cannot store mixed data type elements.
• When we are going to store both Logical and Numerical data types, logical elements are
automatically converted into numerical.
• This automatic upgrading is known as coercion.
• The automatic type conversion is followed as below:
logical->integer->double->complex->character
>a<-c(10L,T)
>typeof(a)
[1] ”integer”
>a<-c(10L,9.5,T)
>typeof(a)
[1] “double”
>a<-c(10L,9.5,T,3+4i)
>typeof (a)
[1] ”complex”
>a<-c(10L,9.5,T,3+4i,”hello”)
>typeof (a)
[1] ”character”
MATRICES
• Matrices are the R objects m which the elements of the same atomic type are arranged in a two-
dimensional rectangular layout.
• The basic syntax for creating a matrix is:
• matrix(data, nrow, ncol, byrow, dimnames)
– data is the input vector which becomes the data elements of the matrix
– nrow is the number of rows to be created
– ncol is the number of columns to be created
– byrow is a logical clue,if TRUE then the input vector elements are arranged by row(row
major matrix)
– dimnames are the names assigned to the rows and columns

CREATING MATRIX:
>m1matrix(1:6,nrow=2)
> m1
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
>m2matrix(1:6,nrow=2,byrow=TRUE)
>m2
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
>m3matrix(1:6,ncol=2)
>m3
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
>m4 <- matrix(1:3,nrow=2,ncol=3)
>m4
[,1] [,2] [,3]
[1,] 1 3 2
[2,] 2 1 3
>rownames(m4) <- c("row1","row2")
>m4
[,1] [,2] [,3]
Row1 1 3 2
Row2 2 1 3
>colnames(m4) <- c("col1","col2","col3")
>m4
Col1 col2 col3
Row1 1 3 2
Row2 2 1 3
>m5<-matrix(1:6,nrow=2,dimnames=list(c("row1","row2"),
c("col1","col2","col3")))
>m5
Col1 col2 col3
Row1 1 3 5
Row2 2 4 6
>dimnames( m5 )
[[1]]
[1] "row1" "row2"
[[2]]
[1] "col1" "col2" "col3"

BINDING TWO MATRICES:


>rbind(m4,m5)
[,1] [,2] [,3]
[1,] 1 3 2
[2,] 2 1 3
[3,] 1 3 5
[4,] 2 4 6
>cbind(m4,m5)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 3 2 1 3 5
[2,] 2 1 3 2 4 6

COERSION OF MATRIX ELEMENTS:


R will automatically convert between built-in object types when appropriate. R will convert from more
specific types to more general types.
Here is an overview of the coercion rules:
o Logical values are converted to numbers: TRUE is converted to 1 and FALSEto 0.
o Values are converted to the simplest type required to represent all information.
o The ordering is roughly logical < integer < numeric < complex < character < list.
o Objects of type raw are not converted to other types.

>mat1matrix(letters[1:6],nrow=2)
>mat1
[,1] [,2] [,3]
[1,] "A" "C" "E"
[2,] "B" "D" "F"
>mat2matrix(1:6,nrow=2)
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
>typeof(mat1)
[1]"character"
>typeof(mat2)
[1]"integer"
>mat3rbind(mat1,mat2)
>mat3
[,1] [,2] [,3]
[1,] "A" "C" "E"
[2,] "B" "D" "F"
[3,] "1" "3" "5"
[4,] "2" "4" "6"
>typeof(mat3)
[1]"character"

Matrix Sub-setting:
> m1 <- matrix(21:32,nrow=3)
>m1
[,1] [,2] [,3] [,4]
[1,] 21 24 27 30
[2,] 22 25 28 31
[3,] 23 26 29 32
>m1[1:2, ] # accessing 1st row and 2nd row
[,1] [,2] [,3] [,4]
[1,] 21 24 27 30
[2,] 22 25 28 31
>m1[ ,2:3] # accessing 2nd column and 3rd column
[,1] [,2]
[1,] 24 27
[2,] 25 28
[3,] 26 29
>m1[1:2,3:4]
[,1] [,2]
[1,] 27 30
[2,] 28 31
>m1[c(F,F,T),c(F,F,T,T)]
[1] 29 32

OPERATIONS ON MATRICES:
# Creating zero matrix:
>m1matrix(0,nrow=2,ncol=3)
[,1] [,2] [,3]
[1,] 0 0 0
[2,] 0 0 0
# Creating unity matrix:
>m1matrix(1,nrow=2,ncol=3)
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 1 1 1
# Creating identity matrix:
>m3diag(3)
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 1 0
[3,] 0 0 1

#transpose of the matrix:


>m4matrix(c(3,1,4,6,9,5),nrow=2)
>m4
[,1] [,2] [,3]
[1,] 3 4 9
[2,] 1 6 5
>t(m4) # transpose of the matrix
[,1] [,2]
[1,] 3 1
[2,] 4 6
[3,] 9 5

Other arithmetic operations:

>a <- matrix(c(3,1,4,6,9,8,2,5,7), nrow=3, ncol=3)


>a
[,1] [,2] [,3]
[1,] 3 6 2
[2,] 1 9 5
[3,] 4 8 7
> b <- matrix(c(-2,2,6,4,5,-1,3,8,9), nrow=3, ncol=3)
> b
[,1] [,2] [,3]
[1,] -2 4 3
[2,] 2 5 8
[3,] 6 -1 9
> a+b # addition
[,1] [,2] [,3]
[1,] 1 10 5
[2,] 3 14 13
[3,] 10 7 16
>a - b # Subtraction
[,1] [,2] [,3]
[1,] 5 2 -1
[2,] -1 4 -3
[3,] -2 9 -2
>a * b # Element wise multiplication
[,1] [,2] [,3]
[1,] -6 24 6
[2,] 2 45 40
[3,] 24 -8 63
>a %*% b # Matrix multiplication
[,1] [,2] [,3]
[1,] 18 40 75
[2,] 46 44 120
[3,] 50 49 139

Solving the equations:


The linear equation set is : 3x + 2y = 7 and 2x – 3y = -4
a <- matrix(c(3,2,2,-3), nrow=2) # coefficient matrix
a
[,1] [,2]
[1,] 3 2
[2,] 2 -3
b <- c(7,-4) # constant vector
solve(a , b)
[1] 1 2

OTHER FUNCTIONS TO WORK WITH MATRICES


• rowmeans(A)---returns vector of row means
• rowsums(A)---returns vector of row sums
• colmeans(A)---returns vector of column means
• colsums(A)---returns vector of column sums
• mean(A)---mean of elements
• sd(A)---standard deviation of elements
• var(A)---variance of elements
• median(A)---median of elements
• quantile(A)---median, quartile and extremes of matrix
Data Frames:
• A data frame is a table or two-dimensional array like structure in which each column contains
values of one variable and each row contains one set of values from each column.
• Characteristics of data frames:
– The column name should be non-empty.
– The row name should be unique.
– The data stored in a data frame can be of numeric, factor or character type.
– Each column should contain same number of data items.
Example of data table:

• The above data set in the real World is in tabular form, which can be stored using data frames.
• This data in the tabular form can be arranged in the data frame in rows and columns.
• data.frame is the function used to create data frame.
• data.frame(vector 1, vector2, ………, stringsAsFactors=FALSE)
• All vectors should be same length.
• By default, string vectors are stored as factors.
• "stringAsFactor=FALSE"-option stores the string vector as it is.
Creating Data Frames:
>rno<-c(501,502,503,504)
>sname<-c("Prasad","Kiran","Lakshmi","mohan")
>age<-c(23,22,21,21)
>marks<-c(78.5,62.6,91.8,97.2)
>student_data<-data.frame(rno, sname, age,
marks, stringsAsFactors=FALSE)
>student_data
rno sname age marks
1 501 Prasad 23 78.5
2 502 Kiran 22 62.6
3 503 Lakshmi 21 91.8
4 504 mohan 21 97.2
>student_data$sname #accessing a column
[1] "prasad " "kiran" " Lakshmi" "Mohan"
>str(student_data)
‘data.frame’: 4 object of 4 variable
$ rno : num 501 502 503 504
$sname: chr "Prasad" "Kiran" " Lakshmi" "mohan"
$age : num 23 22 21 21
$marks : num 78.5 62.6 91.8 97.2
>nrow(student_data)
[1] 4
>ncol(student_data)
[1] 4
>names(student_data)
[1] "rno" "sname" "age" "marks"
#changing column name
>colnames(student_data)[1]<-"rollno"

>colnames(student_data)
[1] "rollno" "sname" "age" "marks"

Some Functions working on Data Frames:


• head() -returns first 6 rows of the data frame
• tail() -returns last 6 rows of the data frame
• summary() -generates five number- summary(statistics)
• sort(dataframe$column) - sorts the data frame based on the column

Sub setting the Data Frames:


>student[ ,c(2,4)] #accessing particular columns
sname marks
Prasad 78.5
Kiran 62.6
Lakshmi 91.8
mohan 97.2
>student[2:3, ] #accessing two rows
rollno sname age marks
2 502 Kiran 22 62.6
3 503 Lakshmi 21 91.8
>student$marks #accessing single column
[1] 78.5 62.6 91.8 97.2
>student_data[c(2:3),c("rollno","sname")]
rollno sname
2 502 Kiran
3 503 Lakshmi

Extending Data Frame:


# adding a column to the data frame
>student_data<-cbind( student_data, grade=c("B","C","A","A"),stringAsFactors=FALSE)
#adding a row to the data frame
>student_data <- rbind(student_data, data.frame(rollno=505,
sname="Kishore", age=22,marks=75.6,grade="B"), stringAsFactors=FALSE)

Lists:
• In R language, vectors, matrices, and arrays are used to store homogenous elements only.
• To store heterogeneous elements, we can use two types of data structures – lists and data
frames.
• A list contains different types of objects, such as strings, numbers, vectors, matrices, and
functions.
• In other words, a list is a generic vector containing other objects.
• A list can be included as a sub-list into another list.
• A list is ordered object, and we can access elements by index.
• The List is been created using list() function.
Creating a list:
> emp1 <- list("Ravi", 50000, TRUE)
> emp1
[[1]]
[1] "Ravi"
[[2]]
[1] 50000
[[3]]
[1] TRUE
# [[i]] – operator is used access element of a list with index.
> emp1[[2]]
[1] 50000
> length(emp1)
[1] 3

Creating a named list:


> emp2 <- list(name="Ravi", salary=50000, married=TRUE)
> emp2
$name
[1] "Ravi"
$salary
[1] 50000
$married
[1] TRUE
> emp2[[2]] # accessing elements from a list [[ ]] is index operator in list
[1] 50000
> emp2$name # $ is membership operator in list
[1] "Ravi"
> emp2[["married"]]
[1] TRUE

Adding elements to the list:


> str(emp2)
List of 3
$ name : chr "Ravi"
$ salary : num 50000
$ married: logi TRUE
> emp2$address <- "Kakinada"
> emp2$dependents <- list("Madhavi", "Mohan")
> emp2[[6]] <- "9875654566"
> names(emp2)
[1] "name" "salary" "married" "address" "dependents"
[6] ""

Deletion of element from the list:


> emp2$salary <- NULL
> emp2
$name
[1] "Ravi"
$married
[1] TRUE
$address
[1] "Kakinada"
$dependents
$dependents[[1]]
[1] "Madhavi"
$dependents[[2]]
[1] "Mohan"
[[5]]
[1] "9875654566"

Converting a list into a vector:


> # Converting a list into a vector:
> list1 <- list(a=10, b=13, c=19)
> v1 <- unlist(list1)
> v1
a b c
10 13 19
Merging two lists into one list:
> list1 <- list(1,3,2)
> list2 <- list("sun", "mon", "tue", "wed")
> mergedlist <- c(list1,list2)

Recursive Lists:
> list3 <- c(list(a=10, b=12, c = list(d=34,e=21)), recursive=TRUE)
> list3
a b c.d c.e
10 12 34 21
How to create an arrays of n-dimensions in R?
Arrays:
• An array is a collection of similar data of N dimensions.
• The function to create array is :
• array(data, dim, dimnames)
• data – input vector
• dim – to define dimensions
• dimnames – to assign names to the dimensions
• Ex:
• >arr1 <- array(11:18, dim = c ( 3,3,2))

Creation of Arrays with names to the dimensions:


> marks <- array(sample(35:90,18), dim=c(2,3,3),
dimnames = list(c("sem1","sem2"), c("sub1","sub2","sub3"), c("1st year","2nd year","3rd year")))
> marks
, , 1st year
sub1 sub2 sub3
sem1 74 55 80
sem2 90 39 89
, , 2nd year
sub1 sub2 sub3
sem1 75 57 71
sem2 46 42 66
, , 3rd year
sub1 sub2 sub3
sem1 86 43 79
sem2 65 44 63
Array sub setting
> marks[, , 1] # accessing first matrix
sub1 sub2 sub3
sem1 74 55 80
sem2 90 39 89
> marks[ , 2, 2] # 2nd matrix, 2nd column
sem1 sem2
57 42
Names of dimensions:
> rownames(marks)
[1] "sem1" "sem2"
> colnames(marks)
[1] "sub1" "sub2" "sub3"
> dimnames(marks)
[[1]]
[1] "sem1" "sem2"
[[2]]
[1] "sub1" "sub2" "sub3"
[[3]]
[1] "1st year" "2nd year" "3rd year"

Explain different types of classes in R.


Classes:
• An object is a data structure having some attributes and methods which act on its attributes.
• A class is a blueprint for a collection of similar objects.
• It defines the properties and methods.
• While most programming languages have a single class system, R has three class system, namely
s3, s4, and reference classes.
• They have their own features and choosing one over the other is a matter of preference.

S3 class:
• S3 class is the most popular and prevalent class in R language.
• Most of the classes that come predefined in R are of this type.
• S3 classes has no formal, predefined definition.
• Basically, a list with in class attribute set to some class name, is an S3 class.
• The components of the list become the member variables of the object.
> # creating list with required elements
> s <- list(name="Ravi", age=21, gpa=3.5)
> # name the class appropriatly
> class(s) <- "student"
>s
$name
[1] "Ravi"
$age
[1] 21
$gpa
[1] 3.5
attr(,"class")
[1] "student"

S4 class:
S4 class have formal definition for class, and uniform way to create objects
setClass("student",
slots= list(name="character", age="numeric", gpa="numeric"))
# Creating object of the class "student"
s<-new("student", name="Ravi", age=21, gpa=3.5)
>s # An object of class "student"
slot "name"
[1] Ravi
slot "age"
[1] 21
slot "gpa"
[1] 3.5
Accessing and modifying slot of an object:
>s@name
[1] " Ravi "
>s@gpa<-5.2

Reference Classes:
• Unlike S3 and S4 classes, methods belong to class rather that to generic functions
Defining Reference Classes:
> student <- setRefClass("student", fields=list(name="character", age="numeric", gpa="numeric"))
> s <- student(name="Ravi", age=22, gpa=3.5)
>s
Reference class object of class "student"
Field "name":
[1] "Ravi"
Field "age":
[1] 22
Field "gpa":
[1] 3.5

Reference class with a method:


> student<-setRefClass("student", fields=list(name="character", age="numeric", gpa="numeric"),
methods = list(
increase_age=function(x){ age<<-age+x }
)
)
S3 vs s4 vs Reference Classes:
Questions given in previous question papers:

Part A
a) What is variable scope?
b) List the differences between vector and list.
c) What are the different modes of working with R
d) List the data structures in R.
e) Create a 3-dimensional array in R.
f) Create a simple matrix with 3X3 size in R.
g) Write about vectors in R
h) Write about type conversions in R?
i) Explain the importance of data frame?
j) What are the data structures in R that is used to perform statistical analyses and create graphs?
k) Write about linear vector algebra operations.
l) Explain different matrix operation functions in R?

Part B
1. a)How basic arithmetic can be carried out in R? Explain with an example each.
b) What is a vector? How to create it? Create a vector X of elements 5, 2, -1, 7 ,4, 8, 12 and from it
create a vector Y containing elements of x>4
2. a) How a data frame is different from a list? Create a data frame of seven days in a week showing
minimum temperatures on that day.
b) What is the difference between NA and NULL values? How to handle them?
3. a) Define a data frame and distinguish it from a matrix object in R
b) Explain in detail about vectors in R.
4.a) Discuss about matrices in R.
b) Explain Datatypes in R
5. a) Explain in detail about dataframe and arrays with example R code.
b) Explain list data structure and its operation with example.
6. a) What is a vector in R? Explain operations on vectors.
b) Explain different data structures in R.
7.a) Write about data frame? Write about operations on data frame.
b) Explain about variables, constants and Data Types in R Programming
8. a) How to create, name ,access , merging and manipulate list elements? Explain with examples.
b) Explain different types of classes with examples?
Assignment:
PART A:

1. List the differences between vector and list.


2. Create a 3-dimensional array in R.
3. What are the different modes of working with R?
4. Explain coercion of matrix elements?
5. Explain seq() and rep() functions in creating vectors?
6. Explain sub-setting and extending of data frames?

PART B:

1. a) Define a data frame and distinguish it from a matrix object in R


b) What is a vector? How to create it? Create a vector X of elements 5, 2, -1, 7 ,4, 8, 12 and from it
create a vector Y containing elements of x>4
2. a) How a data frame is different from a list? Create a data frame of seven days in a week showing
minimum temperatures on that day.
b) Briefly explain about R session and commands supporting to it?
3. a) What is the difference between NA and NULL values? How to handle them?
b) What is a vector in R? Explain operations on vectors.
4. a) Discuss about matrices and its operations in R.
b) Explain about variables, constants and Data Types in R Programming.
5. a) How to create, name ,access , merging and manipulate list elements? Explain with examples.
b) Explain different types of classes with examples?

You might also like