Professional Documents
Culture Documents
Instructions to students:
After finishing the take-home exam, send your MS word document and R_Script to the
examination department at submission@jgu.edu.in on or before time. The subject of the
mail and title of the MS word document and R script should be as your name_SPR.
Students undertaking the examination are requested to adhere to the university norms
related to examinations.
Write commands in the white space after questions. You can add extra space if required.
Write your answers in red fonts, so it becomes quite visible.
This question paper comprises of FOUR (4) printed pages (including this page).
Q1. Import the file "Listed companies_BSE_2017", by assigning it to name "x". This file
shows the companies listed on Bombay Stock Exchange (BSE), along with their
Security.Id, listing status (Active or Delisted), group, ISIN no., and Industry. Based on
the data in this file “x”, answer the following: [10*1=10 Marks]
i) Give the number of companies whose Security.Id starts with letter “B”. (Write
command and output both).
x=read.csv(file.choose())
View(x)
p=x[substr(x$Security.Id,1,1)=="B",]
nrow(p)
415
ii) Give the number of companies whose Security.Id ends with letter “T”. (Write
command and output both).
length(grep("T$.*",Security.Id))
519
iii) Give the number of companies whose Security.Id starts with letter “M” and ends with
letter “T”. (Write command and output both).
length(grep("^M.*T$",Security.Id))
32
h=x[x$Status=="Delisted"| x$Status=="Suspended",]
3772
v) Out of these Delisted or Suspended companies (as we have found in part (iv)), how
many companies belong to either group A or Group B? (Write command and output
both).
[Hint: Group column shows the group of the company such as A, B, XT, etc.]
vi) How many companies are active and belong to group B (in full dataset “x”)? (Write
command and output both).
act=subset(x,(x$Status=="Active")& (x$Group=="B "))
nrow(act)
1375
vii) Out of total 7854 companies in “x” dataset, how many companies are active belonging
to group A with Face value of 2? (Write command and output both).
pct=subset(x,(x$Status=="Active")&(x$Group=="A ")&(x$Face.Value=="2"))
nrow(pct)
84
viii) Out of these companies (as we have found in part (vii)), give the number of companies
whose Security.Id starts with letter “L” and ends with letter “N”. (Write command
and output both).
length(grep("^L.*N$",pct$Security.Id))
ix) Please add a new column “Name_Status_Group” in “x” dataset, which is concatenation
of three columns “Security.Id”, “Status” and “Group”, such as “ABB_Active_A ” for
the first company ABB. (Write command).
attach(x)
x$Name_Status_Group=paste(Security.Id,Status,Group,sep = "_")
Q2. Import the file "Names_Companies", by assigning it to name "y". This file shows the
company name, along with its group, and Industry. Using for loop, you have to write a sentence
for each company like, “X company (Company Name column) belongs to Group Y (Group
column) and Z Industry (Industry Coulmn)”, and save this output to a vector “y1”. (Write
command only, not output). [5 Marks]
library(openxlsx)
y=read.xlsx(file.choose())
attach(y)
y1=c()
nrow(y)
for(i in 1:nrow(y)){
y1[i]=paste(Company.Name[i],"belongs to Group",Group[i],"and",Industry[i],"industry")
print(paste(y1[i]))
}
[For example: for the first company ABB India Limited, the output will be as follows: ABB
India Limited belongs to Group A and Heavy Electrical Equipment Industry.]
Q3. Import the file "Property_Delhi", by assigning it to name "z". This file shows property
rates, per square feet (in Rs.), in different localities of Delhi. A real estate agent makes a claim
that the average property rates in Delhi is Rs. 8,000/sq. feet. You are required to check the
authenticity of this claim using t-test (at 95% level of confidence). Please answer the following:
[5*1=5 Marks]
i) Write the null hypothesis and alternate hypothesis of t-test in this context.
H0 - Null Hypothesis - The Average Property Rates in Delhi is Rs. 8000/Sq. Feet
H1 - Alternate Hypothesis - The Average Property Rates in Delhi is not Rs. 8000/Sq.
iii) What is the output for t-test statistics and p-value? Is the claim of real estate agent true?
t=-2.931,
p-value 0.004377(0.4%) is less than Significance Value 0.05(5%) and we DO NOT
REJECT the null hypothesis. YES, the claim by real estate agent is true
iv) Write the command to conduct right-tail t-test in this scenario, along with the null and
alternate hypothesis.
H1 – Alternate Hypothesis – The Average Property Rates in Delhi is more than Rs.
8000/Sq. Feet
v) Write the command to conduct left-tail t-test in this scenario, along with the null and
alternate hypothesis.
H0 – Null Hypothesis – The Average Property Rates in Delhi is Rs. 8000/Sq. Feet
H1 – Alternate Hypothesis – The Average Property Rates in Delhi is less than Rs. 8000/Sq.
Feet
t.test(`Rate.per.sq.ft.(Rs.)`,alternative = "greater",mu=8000)
Q4. Import the file "Property_Delhi_20", by assigning it to name "z". This file shows
property rates, per square feet (in Rs.), of 20 localities of Delhi. A real estate agent has hired
you to make a claim about the average property rate in Delhi. You are required to use
bootstrapping due to small sample size of 20 localities only, and make a claim about the average
property rate in Delhi. Use set.seed(1000) before you start bootstrapping. Do sampling
(replicate) for 1 million times, and give the mean of these 1 million sample means. (Write all
the commands, and output for mean of sample means only). [Hint: Bootstrapping is taking a
sample out of sample with replacement.] [5 Marks]
z=read.xlsx(file.choose())
attach(z)
set.seed(1000)
sample(z$`Rate.per.sq.ft.(Rs.)`,20, replace = T)
mean(sample(z$`Rate.per.sq.ft.(Rs.)`,20, replace = T))
replicate(1000000, mean(sample(z$`Rate.per.sq.ft.(Rs.)`,20, replace = T)))
VectorAssignedQ4=replicate(1000000, mean(sample(z$`Rate.per.sq.ft.(Rs.)`,20, replace =
T)))
mean(VectorAssignedQ4)
Mean of sample means: 7884.424
Q5. Import the file "Crime_Rate", by assigning it to name "x". This file shows the crime rates
in 50 cities of a country. The description of all variable (X1, X2, X3, X4, X5, X6, X7) is as
follows:
You are required to run a simple bi-variate linear regression such as X2 is dependent variable
and X3 is independent variable. Answer the following: [5*1=5 Marks]
ii) What is the value of intercept and slope (coefficients)? (Write output only)
iii) What is the value of Adjusted R-Square? Write your interpretation about adj. R-Square?
(Write output only)
iv) Are the residuals normal? Comment on the normality of the residuals based on Jarque-
Bera test.
v) How can we get the fitted values of the model?(Write command only)
Model_x$fitted.values