Professional Documents
Culture Documents
CSSSPEC4
(Modeling and Simulation Using MATLAB)
EXERCISE
5
ELEMENTARY STATISTICS WITH R/MATLAB:
NUMERICAL MEASURES
<Name of Student>
<Data Performed>
<Name of Professor>
<Date Submitted>
I.
Objectives:
BACKGROUND INFORMATION
Mean
The mean of an observation variable is a numerical measure of the central location of the
data values. It is the sum of its data values divided by data count.
Hence, for a data sample of size n, its sample mean is defined as follows:
To find the mean eruption duration in the data set faithful, we apply the mean function to
compute the mean value of eruptions.
Median
The median of an observation variable is the value at the middle when the data is sorted
in ascending order. It is an ordinal measure of the central location of the data values.
To find the median of the eruption duration in the data set faithful, we apply the median
function to compute the median value of eruptions.
Quartile
There are several quartiles of an observation variable. The first quartile, or lower quartile,
is the value that cuts off the first 25% of the data when it is sorted in ascending order. The
second quartile, or median, is the value that cuts off the first 50%. The third quartile, or
upper quartile, is the value that cuts off the first 75%.
To find the quartiles of the eruption durations in the data set faithful, we apply the
quantile function to compute the quartiles of eruptions.
The first, second and third quartiles of the eruption duration are 2.1627, 4.0000 and
4.4543 minutes respectively.
Percentile
The nth percentile of an observation variable is the value that cuts off the first n percent of
the data values when it is sorted in ascending order.
To find the 32nd, 57th and 98th percentiles of the eruption durations in the data set
faithful, we apply the quantile function to compute the percentiles of eruptions with the
desired percentage ratios.
The 32nd, 57th and 98th percentiles of the eruption duration are 2.3952, 4.1330 and 4.9330
minutes respectively.
Range
The range of an observation variable is the difference of its largest and smallest data
values. It is a measure of how far apart the entire data spreads in value.
To find the range of the eruption duration in the data set faithful, we apply the max and
min function to compute the largest and smallest values of eruptions, then take the
difference.
To find the interquartile range of eruption duration in the data set faithful, we apply the
IQR function to compute the interquartile range of eruptions.
Variance
The variance is a numerical measure of how the data values is dispersed around the mean.
In particular, the sample variance is defined as:
Similarly, the population variance is defined in terms of the population mean and
population size N:
To find the variance of the eruption duration in the data set faithful, we apply the var
function to compute the variance of eruptions.
To find the covariance of the eruption duration and waiting time in the data set faithful,
we observe if there is any linear relationship between the two variables and we apply the
cov function to compute the covariance of eruptions and waiting.
The covariance of the eruption duration and waiting time is 13.978. It indicates a positive
linear relationship between the two variables.
Central Moment
The kth central moment (or moment about the mean) of a data population is:
Similarly, the kth central moment of a data sample is:
The skewness of eruption duration is -0.41355. It indicates that the eruption duration
distribution is skewed towards the left.
Kurtosis
The kurtosis of a univariate population is defined by the following formula, where 2 and
4 are the second and fourth central moments.
Intuitively, the kurtosis is a measure of the peakedness of the data distribution. Negative
kurtosis would indicates a flat data distribution, which is said to be platykurtic. Positive
kurtosis would indicates a peaked distribution, which is said to be leptokurtic.
Incidentally, the normal distribution has zero kurtosis, and is said to be mesokurtic.
To find the kurtosis of eruption duration in the data set faithful, we apply the function
kurtosis from the e1071 package to compute the kurtosis of eruptions. As the package is
not in the core R library, it has to be installed and loaded into the R workspace.
The kurtosis of eruption duration is -1.5116, which indicates that eruption duration
distribution is platykurtic. This is consistent with the fact that its histogram is not bellshaped.
III.
EXPERIMENTAL PROCEDURE:
Use the faithful dataset to answer for the following:
1.
Find the mean eruption waiting periods in faithful.
2.
Findthemedianoftheeruptionwaitingperiodsinfaithful.
3.
Find the quartiles of the eruption waiting periods in faithful.
4.
Find the 17th, 43rd, 67th and 85th percentiles of the eruption waiting periods in faithful.
5.
Find the range of the eruption waiting periods in faithful.
6.
Find the interquartile range of eruption waiting periods in faithful.
7.
Find the box plot of the eruption waiting periods in faithful.
8.
Findthevarianceoftheeruptionwaitingperiodsinfaithful.
9.
Find the standard deviation of the eruption waiting periods in faithful.
10.
Find the third central moment of eruption waiting period in faithful.
11.
Find the skewness of eruption waiting period in faithful.
12.
Find the kurtosis of eruption waiting period in faithful.
2. Why is kurtosis a measured value in statistics? What does it imply and how does its value
contribute to the analysis of a given data set?