Professional Documents
Culture Documents
Prof. Rita.S
Assistant Professor (Sr), SAS,
VIT University, Vellore.
SCHOOL OF ELECTRONICS ENGINEERING
Abstract
In this project we are calculating the measures of central
tendency viz., mean, median & mode along with the Correlation
between Temperature and Humidity for every individual month
within a time period of 1901 to 2015.
Introduction
Statistics is the study of the collection, analysis, interpretation,
presentation, and organization of data. In applying statistics to, e.g., a
scientific, industrial, or social problem, it is conventional to begin with a
statistical population or a statistical model process to be studied.
Populations can be diverse topics such as "all people living in a country"
or "every atom composing a crystal. Statistics deals with all aspects of
data including the planning of data collection in terms of the design of
surveys and experiments.
These are two broad categories of statistics. They are descriptive and
inferential.
1. Descriptive statistics summarize population data numerically or
graphically by deriving
statistics pertaining to central tendency such as the mean,
median, or mode
statistics pertaining to dispersion around the central tendency
such as the range or standard deviation
statistics or graphs depicting the shape of a distribution
2. Inferential statistics allow one to infer population parameters based
upon sample statistics and to model relationships within the data.
The categories of inferential statistics are
Estimation is the group of statistics which allow for the
estimation about population values based upon sample data.
Modelling allows us to develop mathematical equations which
describe the interrelationships between two or more variables.
Methodology
Null hypothesis:
H0: There is no correlation between temperature and humidity.
Alternate hypothesis:
H1: There is a correlation between temperature and humidity.
Calculations
Month - January
Calculation of Mode:
> table_jan=table(jan_max$TEMP)
> mode=which(table_jan==max(table_jan));
> mode
23.57 23.61 23.91
43 46 59
Calculation of Median:
> Median=median(jan_max$TEMP)
> Median
[1] 23.61
Standard Deviation of Temperatures:
> SD=sd(jan_max$TEMP)
> SD
[1] 0.7390883
Variance in Temperatures:
> Variance=SD*SD
> Variance
[1] 0.5462515
Summary Statistics :
> summary(jan_max$TEMP)
Min. 1st Qu. Median Mean 3rd Qu. Max.
22.00 23.08 23.61 23.64 24.16 25.66
> summary(jan_max$HUMIDITY)
Min. 1st Qu. Median Mean 3rd Qu. Max.
60.25 63.50 66.21 65.66 67.41 70.32
sample estimates:
cor
-0.2758513
Month February
Calculation of Mode:
> table_feb=table(feb_max$TEMP)
> Mode=which(table_feb==max(table_feb))
> Mode
25.12 25.35 26.07
40 47 66
Calculation of Median:
> Median=median(feb_max$TEMP)
> Median
[1] 25.39
Standard Deviation of the Temperatures:
> SD=sd(feb_max$TEMP)
> SD
[1] 1.030881
Variance:
> Variance=var(feb_max$TEMP)
> Variance
[1] 1.062715
Summary Statistics:
> summary(feb_max$TEMP)
Min. 1st Qu. Median Mean 3rd Qu. Max.
22.83 24.78 25.39 25.53 26.28 29.33
> summary(feb_max$HUMIDITY)
Min. 1st Qu. Median Mean 3rd Qu. Max.
60.25 63.50 66.21 65.66 67.41 70.32
sample estimates:
cor
-0.235742
Month March
> Mean=mean(mar_max$HUMIDITY)
> Mean
[1] 65.66303
Calculation of Mode:
> table_mar=table(mar_max$TEMP)
> View(table_mar)
> Mode=which(table_mar==max(table_mar))
> Mode
27.04 27.31 27.62 27.78
28.9
2
13
20
27
45
47
87
94
Calculation of Median:
> Median=median(mar_max$TEMP)
> Median
[1] 29.02
Sandard Deviation:
> SD=sd(mar_max$TEMP)
> SD
[1] 0.99691
Variance:
> Variance=var(mar_max$TEMP)
> Variance
[1] 0.9938296
31
33
34
40
41
Summary Statistics:
> summary(mar_max$TEMP)
Min. 1st Qu. Median
Max.
> summary(mar_max$HUMIDITY)
Min. 1st Qu. Median
Max.
Correlation:
> cor.test(mar_max$TEMP,mar_max$HUMIDITY,method="pearson");
Month April
Correlation:
> Mean
[1] 76.01823
Calculation of Mode:
> table_apr=table(april_max$TEMP)
> Mode=which(table_apr==max(table_apr))
> Mode
31.7
36
Calculation of Median:
> Median=median(april_max$TEMP)
> Median
[1] 31.95
Standard Deviation:
> SD=sd(april_max$TEMP)
> SD
[1] 0.7891133
Variance:
> Variance=var(april_max$TEMP)
> Variance
[1] 0.6226999
Summary Statistics:
> summary(april_max$TEMP)
Min. 1st Qu. Median
Max.
> summary(april_max$HUMIDITY)
Min. 1st Qu. Median
Max.
Inference
We can see that the P value we have obtained for the above months is
less than 0.5 hence the Null hypothesis is true i.e. there is almost no
correlation between temperature and humidity.
References
[1] Summaries of Statistical Analyses of Differences in Relative
Humidity, Temperature. Elizabeth Weatherhead, Gregory
Noonan, Tressa Fowler, Ligia Bernardet, Louisa Nance and Steve
Koch
[2] Climate change and Biodiversity, pp. 63, 2002.