You are on page 1of 5

End-term Examination

BBA-MBA 2015 & MBA 2017


Statistical Programming with R [BS-BA-703]

MAXIMUM MARKS: 30 DURATION: 5 DAYS

Instructions to students:

 After finishing the take-home exam, send your MS word document and R_Script to the
examination department at submission@jgu.edu.in on or before time. The subject of the
mail and title of the MS word document and R script should be as your name_SPR.
 Students undertaking the examination are requested to adhere to the university norms
related to examinations.
 Write commands in the white space after questions. You can add extra space if required.
 Write your answers in red fonts, so it becomes quite visible.
 This question paper comprises of FOUR (4) printed pages (including this page).

Q1. Import the file "Listed companies_BSE_2017", by assigning it to name "x". This file
shows the companies listed on Bombay Stock Exchange (BSE), along with their
Security.Id, listing status (Active or Delisted), group, ISIN no., and Industry. Based on
the data in this file “x”, answer the following: [10*1=10 Marks]

i) Give the number of companies whose Security.Id starts with letter “B”. (Write
command and output both).

x=read.csv(file.choose())
View(x)
p=x[substr(x$Security.Id,1,1)=="B",]
nrow(p)
415

ii) Give the number of companies whose Security.Id ends with letter “T”. (Write
command and output both).
length(grep("T$.*",Security.Id))
519

iii) Give the number of companies whose Security.Id starts with letter “M” and ends with
letter “T”. (Write command and output both).

length(grep("^M.*T$",Security.Id))
32

End-term Examination – Spring Semester [2018 – 2019], JGBS Page 1


iv) How many companies are either Delisted or Suspended? (Write command and output
both).
[Hint: Status column shows the status of the company such as active or delisted]

h=x[x$Status=="Delisted"| x$Status=="Suspended",]
3772

v) Out of these Delisted or Suspended companies (as we have found in part (iv)), how
many companies belong to either group A or Group B? (Write command and output
both).
[Hint: Group column shows the group of the company such as A, B, XT, etc.]

gr=subset(h,(h$Group=="A ")|(h$Group=="B "))


nrow(gr)
592

vi) How many companies are active and belong to group B (in full dataset “x”)? (Write
command and output both).
act=subset(x,(x$Status=="Active")& (x$Group=="B "))
nrow(act)
1375

vii) Out of total 7854 companies in “x” dataset, how many companies are active belonging
to group A with Face value of 2? (Write command and output both).
pct=subset(x,(x$Status=="Active")&(x$Group=="A ")&(x$Face.Value=="2"))
nrow(pct)

84

viii) Out of these companies (as we have found in part (vii)), give the number of companies
whose Security.Id starts with letter “L” and ends with letter “N”. (Write command
and output both).
length(grep("^L.*N$",pct$Security.Id))

ix) Please add a new column “Name_Status_Group” in “x” dataset, which is concatenation
of three columns “Security.Id”, “Status” and “Group”, such as “ABB_Active_A ” for
the first company ABB. (Write command).

attach(x)
x$Name_Status_Group=paste(Security.Id,Status,Group,sep = "_")

End-term Examination – Spring Semester [2018 – 2019], JGBS Page 2


x) Add one more column “Characters” in “x” dataset. This column must show the number
of characters in the column “Name_Status_Group”. (Write command).
x$Characters=nchar(Name_Status_Group)-1

Q2. Import the file "Names_Companies", by assigning it to name "y". This file shows the
company name, along with its group, and Industry. Using for loop, you have to write a sentence
for each company like, “X company (Company Name column) belongs to Group Y (Group
column) and Z Industry (Industry Coulmn)”, and save this output to a vector “y1”. (Write
command only, not output). [5 Marks]

library(openxlsx)
y=read.xlsx(file.choose())
attach(y)
y1=c()
nrow(y)
for(i in 1:nrow(y)){
y1[i]=paste(Company.Name[i],"belongs to Group",Group[i],"and",Industry[i],"industry")
print(paste(y1[i]))
}

[For example: for the first company ABB India Limited, the output will be as follows: ABB
India Limited belongs to Group A and Heavy Electrical Equipment Industry.]

Q3. Import the file "Property_Delhi", by assigning it to name "z". This file shows property
rates, per square feet (in Rs.), in different localities of Delhi. A real estate agent makes a claim
that the average property rates in Delhi is Rs. 8,000/sq. feet. You are required to check the
authenticity of this claim using t-test (at 95% level of confidence). Please answer the following:
[5*1=5 Marks]

i) Write the null hypothesis and alternate hypothesis of t-test in this context.

H0 - Null Hypothesis - The Average Property Rates in Delhi is Rs. 8000/Sq. Feet
H1 - Alternate Hypothesis - The Average Property Rates in Delhi is not Rs. 8000/Sq.

ii) Write the command to conduct t-test.

t.test(`Rate.per.sq.ft.(Rs.)`,alternative = "two.sided", conf.level = 0.95, mu=8000)

iii) What is the output for t-test statistics and p-value? Is the claim of real estate agent true?

t=-2.931,
p-value 0.004377(0.4%) is less than Significance Value 0.05(5%) and we DO NOT
REJECT the null hypothesis. YES, the claim by real estate agent is true

iv) Write the command to conduct right-tail t-test in this scenario, along with the null and
alternate hypothesis.

End-term Examination – Spring Semester [2018 – 2019], JGBS Page 3


H0 – Null Hypothesis – The Average Property Rates in Delhi is Rs. 8000/Sq. Feet

H1 – Alternate Hypothesis – The Average Property Rates in Delhi is more than Rs.
8000/Sq. Feet

t.test(`Rate.per.sq.ft.(Rs.)`,alternative = "greater", mu=8000)

v) Write the command to conduct left-tail t-test in this scenario, along with the null and
alternate hypothesis.

H0 – Null Hypothesis – The Average Property Rates in Delhi is Rs. 8000/Sq. Feet
H1 – Alternate Hypothesis – The Average Property Rates in Delhi is less than Rs. 8000/Sq.
Feet

t.test(`Rate.per.sq.ft.(Rs.)`,alternative = "greater",mu=8000)

Q4. Import the file "Property_Delhi_20", by assigning it to name "z". This file shows
property rates, per square feet (in Rs.), of 20 localities of Delhi. A real estate agent has hired
you to make a claim about the average property rate in Delhi. You are required to use
bootstrapping due to small sample size of 20 localities only, and make a claim about the average
property rate in Delhi. Use set.seed(1000) before you start bootstrapping. Do sampling
(replicate) for 1 million times, and give the mean of these 1 million sample means. (Write all
the commands, and output for mean of sample means only). [Hint: Bootstrapping is taking a
sample out of sample with replacement.] [5 Marks]
z=read.xlsx(file.choose())
attach(z)
set.seed(1000)
sample(z$`Rate.per.sq.ft.(Rs.)`,20, replace = T)
mean(sample(z$`Rate.per.sq.ft.(Rs.)`,20, replace = T))
replicate(1000000, mean(sample(z$`Rate.per.sq.ft.(Rs.)`,20, replace = T)))
VectorAssignedQ4=replicate(1000000, mean(sample(z$`Rate.per.sq.ft.(Rs.)`,20, replace =
T)))
mean(VectorAssignedQ4)
Mean of sample means: 7884.424

Q5. Import the file "Crime_Rate", by assigning it to name "x". This file shows the crime rates
in 50 cities of a country. The description of all variable (X1, X2, X3, X4, X5, X6, X7) is as
follows:

X1 = total overall reported crime rate per 1 million residents


X2 = reported violent crime rate per 100,000 residents
X3 = annual police funding in $/resident
X4 = % of people 25 years+ with 4 yrs. of high school

End-term Examination – Spring Semester [2018 – 2019], JGBS Page 4


X5 = % of 16 to 19 year-olds not in highschool and not highschool graduates.
X6 = % of 18 to 24 year-olds in college
X7 = % of people 25 years+ with at least 4 years of college

You are required to run a simple bi-variate linear regression such as X2 is dependent variable
and X3 is independent variable. Answer the following: [5*1=5 Marks]

i) Run a bivariate regression such as X2 is dependent variable and X3 is independent


variable, and assign it to Model_x. (Write command only).
x=read.csv(file.choose())
View(x)
attach(x)
Model_x=lm(x$X2~x$X3)
summary(Model_x)
Model_x_Summary=summary(Model_x)

ii) What is the value of intercept and slope (coefficients)? (Write output only)

Intercept Value = -182.24539 & Slope = 21.14474

iii) What is the value of Adjusted R-Square? Write your interpretation about adj. R-Square?
(Write output only)

Adjusted R-squared: 0.2439981


Data Interpretation – It’s telling about the goodness-of-fit for the regression model (Model_x
).

iv) Are the residuals normal? Comment on the normality of the residuals based on Jarque-
Bera test.

JB = 35.43 | p-value = 5e-04 (0.0005)


The test statistic is always non-negative. If it is far from zero, it signals the data do not have a
normal distribution.

v) How can we get the fitted values of the model?(Write command only)

Model_x$fitted.values

End-term Examination – Spring Semester [2018 – 2019], JGBS Page 5

You might also like