You are on page 1of 9

Assignment-1 Decision Sciences-I

Shrey Agarwal (1811079, Section-B)


Q1. Select a context and data-set of your choice and interest, containing few (at least one, ideally more)
variable and (at least) hundred data points. You may use the company databases subscribed by the
institute library or any data from the internet, for this purpose. Identify the type for each of the variable.
List down (describe) a few interesting parameters based on the variable(s).
Ans.
I have chosen the data showing country wise demographic data. The data is in public domain from US
government. The report was compiled by combining information from files at:
http://gsociology.icaap.org/dataupload.html
The report contains demographic data for 227 nations. The variables for each nation shown in this dataset
are: Region, Population, Area (sq. mi.), Infant Mortality (per 1000 births), GDP ($ per capita), Literacy %,
Phones (per 1000), and Arable %.
A snapshot of the data is shown below:

a) Type of Variables:

Variable Type
Region Nominal Categorical Variable
Population Interval Quantitative Variable
Area (sq. mi.) Interval Quantitative Variable
Infant Mortality (per 1000 births) Interval Quantitative Variable
GDP ($ per capita) Interval Quantitative Variable
Literacy % Interval Quantitative Variable
Phones (per 1000) Interval Quantitative Variable
Arable % Interval Quantitative Variable

b) Interesting Parameters based on the variables:


 The average Infant Mortality Rate (per 1000 births) for 224 countries comes out to be 35.5070
infants with Standard Deviation as 35.3899. The lowest infant mortality is boasted by Singapore
(2.29) while the highest infant mortality rate is in Angola (191.19)
 The phone usage data for 223 countries shows that on an average 236.0577 out of each 1000
people in the world have phones. Monaco (Western Europe) has 1035.6 phones per 1000
people while Congo (Sub-Saharan Africa) only has 0.17 phones per 1000 people.
 The population and Area variables for 227 countries show the average population density as
379.0425 people per sq. mi. This density varies from very sparse Greenland (0.026) to very high
in Monaco (16271.5). The region with highest population density is Asia (Excluding Middle East).
Q2. Now prepare a suitable focused report (limited to 3 pages), that includes (i) a few key summary
statistics (ii) A few (2 to 3) pictorial representation that best summarizes and communicates the
characteristics of the chosen data set. Your report should communicate the significance of your reported
summary statistics and graphical representations, with specific reference to your data. [Make use of this
part of the exercise to practice various graphical and basic data summarization tools of Excel]
Ans.
a) Key Summary Statistics on the variables:
Infant mortality GDP ($ Phones
Statistics Population Area (sq. mi.) Literacy % Arable %
(per 1000 births) per capita) (per 1000)
Mean 2,87,40,284.37 5,98,226.96 35.51 9,689.82 82.84 236.06 13.80

Median 47,86,994.00 86,600.00 21.00 5,550.00 92.50 176.15 10.42

Mode #N/A 102.00 9.95 800.00 99.00 26.80 -


Standard
11,78,91,326.54 17,90,282.24 35.39 10,049.14 19.72 227.99 13.04
Deviation
1,31,39,66,687.
Range 1,70,75,198.00 188.90 54,600.00 82.40 1,035.38 62.11
00
Minimum 7,026.00 2.00 2.29 500.00 17.60 0.17 -

1,31,39,73,713.
Maximum 1,70,75,200.00 191.19 55,100.00 100.00 1,035.55 62.11
00
6,52,40,44,551. 13,57,97,519.0 21,89,900.
Sum 7,953.56 17,313.20 52,640.86 3,104.35
00 0 00
Count 227.00 227.00 224.00 226.00 209.00 223.00 225.00

Largest(57) 1,76,54,843.00 4,46,550.00 55.51 15,700.00 97.80 384.88 20.00

Smallest(57) 4,36,131.00 4,167.00 8.19 1,900.00 76.20 38.42 3.22

IQR 1,72,18,712.00 4,42,383.00 47.32 13,800.00 21.60 346.46 16.78

b) Pictorial representations to summarize the data characteristics:


I. Region vs. Population Density (per sq. mi.)

180.00 Population Density 159.68


Popualtion Density (per sq. mi.)

160.00
140.00
120.00 104.07 106.82
100.00
80.00
60.00 44.79
41.05
40.00 26.82 27.35 30.79
12.67 15.23
20.00 3.89
-

Region

The above representation of Population Density created using Pivot Table, shows that the
most densely populated region in the world is Asia (Excluding Near East) with 159.7 people
per sq. mi. while the least populated region is the Oceania with only 3.89 people per sq. mi.
II. Region wise Population proportion (%)
Population % 1.84%
0.11% 0.51% 2.47%
2.99%
4.29%
5.08%

6.08%

8.61%
56.53%

11.49%

Baltics Oceania Eastern Europe Northern Africa


Near East C.W. of Ind. States Northern America Western Europe
Latin Amer. & Carib. Sub-Saharan Africa Asia (Ex. Near East)

The above pie chart shows the distribution of World Population across different regions.
56.53% of the world’s population lives in Asian countries (excluding Middle East) and
11.49% lives in Sub-Saharan African countries.

III. Region-wise Average per capita GDP ($)


Average of GDP ($ per capita) 27,046.43
30,000.00
26,100.00
Average per Capita GDP ($)

25,000.00

20,000.00

15,000.00

10,000.00

5,000.00 2,323.53

Region

The demographic data shows that lowest average per capita GDP if for Sub-Saharan African
countries where 11.49% of the population lives while the highest per capita GDP is in North
American and Western European countries where also 11.06% of population resides.

IV. Correlation between Infant Mortality and Literacy %


Infant Mortality vs Literacy % Correlation
200

y = -1.407x + 153.69
150

100

50

0
0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0 90.0 100.0

The chart above shows the simple correlation (assumed to be linear to simplify things)
between Infant Mortality (per 1000 births) and the Literacy% of all the countries. It shoes
that as the literacy % increases the Infant Mortality rate decreases and vice versa.
If considered linear correlation, Infant Mortality and Literacy % has a slope of -1.407.

V. Region wise Arable Land % and Population %


60 Regionwise Arable Land % and Population %
50

40

30

20

10

Average of Arable % % Population

The above chart shows the skewed distribution of arable land % when compared with the
population % that resides in each region. Asia (Excluding Middle East) sustains 56.56% of
world’s population only on an average arable land of 15.87%. On the other hand, Baltic and
Eastern European countries have approx. 30% of arable land each sustaining only 0.11% and
1.84% of world’s population respectively.
Q3. Solve any two problems from # 48 -- # 58 in page 208 –211 of the textbook.
Ans.
# 56 from Chapter-4 Pg.: 211

Days Listed Until Sold


Under 30 31-90 Over 90 Total
Under $150,000 50 40 10 100
Asking Price

$150,000 - $199,999 20 150 80 250


Initial

$200,000 - $250,000 20 280 100 400


Over $250,000 10 30 10 50
Total 100 500 200 800

a) If Event 𝑨: A home is listed for more than 90 days before being sold, then find 𝑃(𝐴).
# 𝑂𝑣𝑒𝑟 90 𝑑𝑎𝑦𝑠 𝑡𝑜 𝑠𝑒𝑙𝑙 = 200
# 𝑡𝑜𝑡𝑎𝑙 ℎ𝑜𝑚𝑒𝑠 𝑖𝑛 𝑠𝑎𝑚𝑝𝑙𝑒 = 800
200
Hence, 𝑷(𝑨) = = 0.25
800

b) If Event 𝑩: Initial asking price is under $150,000, then find 𝑃(𝐵).


# 𝐼𝑛𝑖𝑡𝑖𝑎𝑙 𝐴𝑠𝑘𝑖𝑛𝑔 𝑃𝑟𝑖𝑐𝑒 𝑈𝑛𝑑𝑒𝑟 $150,000 = 100
# 𝑡𝑜𝑡𝑎𝑙 ℎ𝑜𝑚𝑒𝑠 𝑖𝑛 𝑠𝑎𝑚𝑝𝑙𝑒 = 800
100
Hence, 𝑷(𝑩) = = 0.125
800

c) Find 𝑃(𝐴 ∩ 𝐵)
# 𝑆𝑜𝑙𝑑 𝑖𝑛 > 90 𝑑𝑎𝑦𝑠 𝑎𝑛𝑑 𝐼𝑛𝑖𝑡𝑖𝑎𝑙 𝐴𝑠𝑘𝑖𝑛𝑔 𝑃𝑟𝑖𝑐𝑒 𝑈𝑛𝑑𝑒𝑟 $150,000 = 10
# 𝑡𝑜𝑡𝑎𝑙 ℎ𝑜𝑚𝑒𝑠 𝑖𝑛 𝑠𝑎𝑚𝑝𝑙𝑒 = 800
10
Hence, 𝑷(𝐴 ∩ 𝐵) = = 0.0125
800

d) If initial asking price < $150,000, then find 𝑃(𝑊𝑖𝑙𝑙 𝑡𝑎𝑘𝑒 > 90 𝑑𝑎𝑦𝑠 𝑡𝑜 𝑠𝑒𝑙𝑙).

# 𝑂𝑣𝑒𝑟 90 𝑑𝑎𝑦𝑠 𝑡𝑜 𝑠𝑒𝑙𝑙 𝒈𝒊𝒗𝒆𝒏 𝑢𝑛𝑑𝑒𝑟 $150,000 = 10


# 𝑡𝑜𝑡𝑎𝑙 ℎ𝑜𝑚𝑒𝑠 𝑤𝑖𝑡ℎ 𝑢𝑛𝑑𝑒𝑟 $150,000 𝑎𝑠𝑘𝑖𝑛𝑔 𝑝𝑟𝑖𝑐𝑒 = 100
10
Hence, 𝑷(> 𝟗𝟎 | 𝑼𝒏𝒅𝒆𝒓 $𝟏𝟓𝟎, 𝟎𝟎𝟎) = 𝑃(𝐴|𝐵) = = 0.10
100

e) Are events 𝐴 and 𝐵 independent?


No.
As, 𝑷(𝑨) ≠ 𝑷(𝑨|𝑩), therefore 𝐴 and 𝐵 are not independent.
# 52 from Chapter-4 Pg.: 209-210

Applied to more than 1 school


Yes No
Age Group <= 23 207 201
24 – 26 299 379
27 – 30 185 268
31 – 35 66 193
>=36 51 169

a) Prepare a joint probability table of observing student’s age and no. of schools applied to

Applied to > 1 school Not applied to > 1 school Total


<= 23 0.1026 0.0996 0.2022
24 – 26 0.1482 0.1878 0.3360
27 – 30 0.0917 0.1328 0.2245
31 – 35 0.0327 0.0956 0.1283
>=36 0.0253 0.0837 0.1090
Total 0.4004 0.5996 1.0000

b) For a randomly selected applicant, find 𝑃(𝐴𝑔𝑒 ≤ 23)

𝑃(𝐴𝑔𝑒 ≤ 23) = 𝟎. 𝟐𝟎𝟐𝟐


c) For a randomly selected applicant, find 𝑃(𝐴𝑔𝑒 > 26)
𝑃(𝐴𝑔𝑒 > 26) = 𝑃(𝐴𝑔𝑒 27 − 30) + 𝑃(𝐴𝑔𝑒 31 − 35) + 𝑃(𝐴𝑔𝑒 ≥ 36) =
0.2245 + 0.1283 + 0.1090 = 𝟎. 𝟒𝟔𝟏𝟖 = 1 − (𝑃(𝐴𝑔𝑒 ≤ 23) + 𝑃(𝐴𝑔𝑒 24 − 26))
d) For a randomly selected applicant, find 𝑃(𝐴𝑝𝑝𝑙𝑖𝑒𝑑 𝑡𝑜 > 1 𝑠𝑐ℎ𝑜𝑜𝑙)
𝑃(𝐴𝑝𝑝𝑙𝑖𝑒𝑑 𝑡𝑜 > 1 𝑠𝑐ℎ𝑜𝑜𝑙) = 𝟎. 𝟒𝟎𝟎𝟒
Q4. Open the excel template Bayesian Revision, uploaded in the Moodle. This macro exhibits how prior
probabilities get updated with additional information using the Bayes’ theorem.
a) Select a range for 𝒑 in (0,1) consisting of up to 10 values (you need not choose 10).
b) Put a valid prior distribution on this.
𝒙
c) Now keep on changing the 𝒏 (sample size) and 𝒙 (no. of success) pairs keeping the ratio 𝒏 to be
roughly the same. You need to select at least 5 such combinations (e.g. 𝒏 = 5, 10, 15, 20, 50 with
suitable 𝒙’s ). [You should not change your prior distribution]. Observe the corresponding
posterior distributions.
d) Your submitted output should consist of
I. Prior distribution selected,
II. Pairs of (𝒙, 𝒏), and
III. Graphs representing all posterior (and prior) probability distributions.
Ans.
I. Prior Distribution Selected:

𝑝 0.12 0.24 0.36 0.48 0.6 0.72 0.84 0.96 Total


Prior 0.02 0.13 0.18 0.37 0.21 0.06 0.01 0.02 1

II. Pairs of (𝑥, 𝑛)

Sr. No. 𝒙 𝒏
1 2 12
2 3 18
3 5 30
4 6 36
5 8 48

III. Graphs representing posterior and prior probability distribution for each pair of (𝑥, 𝑛)
a. 𝑛 = 12, 𝑥 = 2
b. 𝑛 = 18, 𝑥 = 3

c. 𝑛 = 30, 𝑥 = 5

d. 𝑛 = 36, 𝑥 = 6
e. 𝑛 = 48, 𝑥 = 8

You might also like