You are on page 1of 8

Guide for Windows Excel 2003 Analysis Toolpak Add-in and the Normal Distribution Functions James W.

Taylor

This guide introduces the Analysis Toolpak Excel add-in, and describes how to use this add-in to produce descriptive statistics, correlation analysis and histograms for a dataset using Excel. The guide then introduces Excels normal distribution functions. The guide consists of the following sections: 1. Analysis Toolpak Descriptive Statistics 2. Analysis Toolpak Correlation Analysis 3. Analysis Toolpak Histograms 4. The Normal Distribution Excel Functions We must attach Excels statistical add-in options: From the Tools menu, select Add-Ins In the Add-Ins dialog box select: Analysis ToolPak - VBA and Analysis ToolPak

1. ANALYSIS TOOLPAK - DESCRIPTIVE STATISTICS The Excel file ElectricityConsumption.xls contains monthly observations from January 2004 to July 2012 for the following variables:
ELEC C66 C76 H55 DINC AIRC Residential electricity sales (KWh) per customer in a mid-Atlantic U.S. city Cooling degree hours at base temperature 66 degrees (a measure of summer heat)1 Cooling degree hours at base temperature 76 degrees (a measure of summer heat) Heating degree hours at base temperature 55 degrees (a measure of winter cold)2 Disposable income per household ($) Proportion of households with air conditioning

The ultimate aim would be to build a forecasting model for residential electricity consumption. But, in this guide, we perform only preliminary descriptive analysis.
1 2 3 4 5 6 7 8 9 10 11 12 13 A MONTH Jan-04 Feb-04 Mar-04 Apr-04 May-04 Jun-04 Jul-04 Aug-04 Sep-04 Oct-04 Nov-04 Dec-04 B ELEC 681.7 620.3 590.8 538.0 513.4 575.5 1019.3 1203.9 1176.7 723.0 519.0 604.9 C C66 20 0 20 14 559 1601 5348 7416 6887 2975 427 9 D C76 0 0 0 0 3 83 833 1547 1287 398 5 0 E H55 10148 12504 9300 5333 2846 282 1 0 0 155 1812 5779 F DINC 34825 34934 35050 35172 35302 35438 35583 35734 35892 36056 36222 36391 G AIRC 0.698 0.701 0.705 0.708 0.712 0.716 0.72 0.724 0.728 0.731 0.735 0.739

Use the Analysis Toolpak Descriptive Statistics tool to get summary statistics (in one sequence of operations) for all 6 variables, by selecting Tools Data Analysis Descriptive Statistics

In the resulting dialog box, select Descriptive Statistics

The cooling degree hours at base temperature T is:

in
i 1 i 1

where ni is the number of hours in the month at temperature T+i.

The heating degree hours at base temperature T is:

in

where ni is the number of hours in the month at temperature T-i.

In the Descriptive Statistics dialog box, specify: Input Range as the range containing values and variable names: B1:G104 Click the Labels in First Row checkbox Output options as New Worksheet Ply with the name Descriptive Statistics Click the Summary Statistics checkbox.

The new worksheet should contain descriptive statistics for each of the six variables. The results for the Electricity consumption variable are shown below left. Below right are shown the Excel functions that can be used separately to deliver the same values. Experiment with several of these functions (e.g. =average, =stdev, =count), and confirm that you get the same values as in the Analysis Toolpak descriptive statistics output below left.

ELEC Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count 767.2767 23.50939 685.8901 #N/A 238.5942 56927.2 0.667518 1.251095 1017.264 504.4549 1521.719 79029.5 103 Corresponding Excel functions: =average =median =mode =stdev =var =kurt =skew =range =max =min =sum =count

2. ANALYSIS TOOLPAK - CORRELATION ANALYSIS

Return to the Data worksheet From the main Excel menu, click on Tools Data Analysis... In the resulting dialog box, select Correlation

In the Correlation dialog box, specify: Input Range: as B1:G104 Grouped By: as Columns, so that Excel knows that each column is a variable. The Labels in First Row checkbox should be crossed Output options: as New Worksheet Ply with the name Correlations Click OK.

The correlation matrix below should result. Correlation coefficients for pairs of variables indicate the levels of linear association between them, e.g. ELEC and C76 have correlation of 0.94, so that as C76 rises, ELEC rises.
ELEC 1.00 0.92 0.94 -0.36 0.14 0.14 C66 0.92 1.00 0.95 -0.65 0.02 0.02 C76 0.94 0.95 1.00 -0.52 0.01 0.01 H55 -0.36 -0.65 -0.52 1.00 -0.04 -0.05 DINC 0.14 0.02 0.01 -0.04 1.00 0.94 AIRC 0.14 0.02 0.01 -0.05 0.94 1.00

ELEC C66 C76 H55 DINC AIRC

You should get the same values using the Excel function =CORREL to calculate correlations for any chosen pair of variables.

3. ANALYSIS TOOLPAK - HISTOGRAM We now use the Analysis Toolpak Histogram tool. From the main Excel menu, click on Tools Data Analysis... In the resulting dialog box, select Histogram

In the Histogram dialog box, specify: Input Range as the range containing values and variable name for the ELEC variable: B1:B104 There is no need to enter anything for the Bin Range Click the Labels checkbox Output options as New Worksheet Ply with the name Histogram Click the Chart Output checkbox. Click OK.

Histogram
35 30 25 20 15 10 5 0
504.5 606.2 707.9 809.6 911.4 1013.1 1114.8 1216.5 1318.3 1420.0 More Frequency

Frequency

Bin

4. THE NORMAL DISTRIBUTION EXCEL FUNCTIONS In this section, we introduce two useful normal distribution functions.

4.1. The Excel Function =NORMDIST The waiting time for a particular hospital procedure is normally distributed with a mean of 30 days and a standard deviation of 9 days. What is the probability that a randomly chosen patient will have to wait more than 42 days?

Consider the following Excel function: =NORMDIST(42,30,9,1) Note the various entries in this function, and the order in which they feature: 42 = Value for which you want the probability 30 = Mean 9 = Standard deviation 1 = If the fourth entry of the function is 1, a probability is returned. But if the fourth entry is 0, the height of the probability density function is given (which is not a useful value). With these values, the function =NORMDIST(42,30,9,1) returns a probability of 0.909. It is clear that this is not the tail probability shaded in the diagram above. Instead, it is the probability of getting a value less than 42. To get the probability shaded in the diagram, we calculate: =1-NORMDIST(42,30,9,1) This delivers the required probability of 0.091 (i.e. 9.1%).

4.2. The Excel Function =NORMINV The waiting time for a particular hospital procedure is normally distributed with a mean of 30 days and a standard deviation of 9 days. There is a 5% probability of the waiting time being less than what number of days?

N(30,9)

5%

30

We can calculate the value X using the following Excel function: =NORMINV(0.05,30,9) Note the various entries in this function, and the order in which they feature: 0.05 = Probability of a value less than X 30 = Mean 9 = Standard deviation With these values, the function =NORMINV(0.05,30,9) returns a value of 15.2 (i.e. 15.2 days). This is the value X in the diagram above.

Exercises For the large number of candidates taking a particular examination for a professional accounting qualification, the final examination grades have a mean of 67.4 and a standard deviation of 12. Assuming that the distribution of these grades is normal, find: (a) the percentage of grades that exceed 85; (b) the percentage less than 45; (c) the number of passes (pass mark is 50) in a class of 180; (d) the lowest distinction mark if the highest 8% of grades are to be regarded as distinctions.

Solution to Exercise Throughout this exercise, Grade ~ N(67.4, 12) (a) =1-NORMDIST(85,67.4,12,1) = 1-0.929 = 7.1%

67.4

85

(b) =NORMDIST(45,67.4,12,1) = 3.1%

45

67.4

(c) Let us first calculate the proportion of passes in the class. This is equivalent to the probability that an individual passes, i.e. P(Grade>50) =1-NORMDIST(50,67.4,12,1) = 1-0.0735 = 0.9265 In a class of 180, this implies that 0.9265 180 = 166.8 167 students pass the exam.

50

67.4

(d) If 8% of students are awarded distinctions, then the lowest distinction mark corresponds to the value X associated with the shaded area in the diagram. The probability of a value less than X is 92%. Using this in the NORMINV Excel function, we obtain X: =NORMINV(0.92,67.4,12) =84.3

You might also like