You are on page 1of 8

Data Analysis Using StatsDirect

A. Investigating probability

1. Open a new StatsDirect workbook

• To start StatsDirect go to Start > All Programs > Core > Statistics > StatsDirect2006
• When StatsDirect opens, click Cancel.

2. Use the z score to calculate a probability

The mean FEV1 in 57 male medical students is 4.06 litres with a standard deviation of 0.67 litres.

What is the probability of one of the students having an FEV1 of 3.57 litres or less?

You need to calculate: x–µ 3.57 – 4.06

z = σ = 0.67

• In StatsDirect, click Tools > Calculator

Constructing a formula • Type (3.57-4.06)/0.67 into the Expression to evaluate box.
in StatsDirect:
Click Calculate
The formula will use one
or more of these
• Then click Save > Close. Click Yes to copy saved results to
mathematical operators: report.

^ exponentiation You need to convert the answer for z into a percentage to find
(raise to the the probability:
power of)
* multiplication • Click Analysis > Distributions > Normal
/ division
• Type the z value you’ve calculated into the Normal deviate (z)
+ addition box
- subtraction
• Click Calculate. The lower tail P value gives the probability of
a medical student having an FEV1 of 3.57 litres or less.

Question: What is the probability (as a percentage) that the student will have an FEV1 of
4.55 litres or less?

• Click Save > Close > Select to send the result of the analysis to the report.
3. Use the z score to calculate the interquartile range of the FEV1 values (the middle
half of the values)

First calculate the z value for the lower 25%

interquartile range:

• Click Analysis > Distributions > Normal

• Type 0.25 in the lower tail P value box and click
• Click Save > Close > Select

Now convert the z value into the FEV1 value:

• Click Tools > Calculator and type the values for µ + z*σ into the Expression to
evaluate box, using the z value for the lower 25% and the mean and standard deviation in
the equation above.
• Remember to save your result to the report.
• The result is the FEV1 in litres at the lower end of the interquartile range (ie 25% of the
medical students will have this FEV1 or lower).

Now calculate the FEV1 value for the upper 75%:

• Click Tools > Calculator and type the values for µ - lower 25% + µ

Question: What is the interquartile range for the FEV1 values?

4. Use the z score to find the number of asthmatics with obstructive airway disease
The lung function in a sample of 34 similar aged male asthmatics was investigated. The mean
FEV1 was 3.78 litres with a standard deviation of 0.74. The obstructive airway FEV1 is set at
2.85 litres or less.

a) Use the equation above to work out the proportion of asthmatics with an FEV1
that indicates an obstructive airway in this group.

b) What number of the asthmatics in the group would this represent?

B Investigating normal distribution and confidence intervals

Here you are going to work with data from your lung function practical class, to investigate
height and peak expiratory flow rate. The sex has been coded: male = M; female = F.

Go to Blackboard > Discussions > Lung function practical and download the file Lung
function 2009.sdw to My Documents.

1. Plotting the data

• In StatsDirect, click on File > Open File > Look in > My Documents > Lung function
2009.sdw. Save the workbook to My Documents.
• Plot a histogram for height: click on B at the top of the height column then click on Graphics
> Histogram. If you approve the title, click OK or change the title. Click OK on Histogram Bin
Setup then Yes to overlay a normal curve, OK on Histogram unless you want to change the
axis title. Select output destination.
• The data should now look something like this:

• Warning!!! If you try to print the data file when any part of the workbook is highlighted this will
jam up the printer.

• Split the data for height and PEFR by sex: Click Data > Grouping > Split. Select column
A (sex) and click OK. To select the data you want to split click on column B (Height). Click
OK. You will now be asked if it is 2 groups. Click OK. Put the output in a new sheet. Split
the data for PEFR and put it on the same new sheet.

• Plot separate histograms for male and female height.

2. Is- there a think
Do you difference in maleshow
the histograms and that
female PEFR?
the data for height are normally distributed?
- Are there any outliers? Could these prevent the data showing a normal distribution?
What test should you use to compare the means of male and female PEFR?
Clues: Is the data parametric or non-parametric; which Student’s t test should you use? (Look
at the audio lecture in Case 3 for help here.)

• Click Analysis and select the correct test then highlight the column containing the PEFR
values for females, hold down the control key and highlight the column for male PEFR.
Click OK and the test will be carried out. Click Select to send the results to the report.

The report should look like this: How can you tell if there is a significant
difference in male and female PEFR?
Mean of PEFR~Sex=F =
Mean of PEFR~Sex=M = The first two lines of the report show the
means, followed by 2 alternative outputs of the
Assuming equal variances data:
Combined standard error =
df =
t= o Assuming equal variance
One sided P o Assuming unequal variance
Two sided P
Choose the output after looking at bottom of the
95% confidence interval for difference report, Comparison of variances which will
between means = show one of those below:
Power (for 5% significance) >

Assuming unequal variances

Combined standard error =
df =
t(d) =
One sided P
Two sided P

95% confidence interval for difference

between means =

Power (for 5% significance) >

Comparison of variances


What does the 95% confidence interval for the difference

between means tell you?
3. Are height and PEFR

• Click on Graphics > Scatter. Click OK to the default value 1.

• Click on the male PEFR column for the y axis, click OK

• Click on the male height column for the x axis, click OK

• In Chart settings, change the title to Male height v PEFR, click OK, and send it to the

• Your scatter plot should look like this:

• Do the same for the female data

Question: Are there any outliers in either male or female samples? What effect could they
have on the result?

4. Correlation and regression

Correlation is a method to establish whether there is a relationship between the two variables (eg
height and PEFR).

Regression determines the nature of that relationship.

• Click Analysis > Regression and correlation > Simple Linear & Correlation.
Simple linear regression (example)
• Input:
Equation: PEFR~Sex=M = 4.255232 Height~Sex=M
º Outcome variable = PEFR column
º Predictor variable = height column
Standard Error of slope = 1.071842
• Standard Error of slope measures
the slope’s variability
•95% CI for
Click population
Plot value ofclick
Regression, slope = 2.133586
Plot,to to the regression
produce line on a scatter plot.
6.376879 The slope measures the relationship
Analysing the results: between the variables
Correlation coefficient (r) = 0.337022 (r² = 0.113584)
• 95% CI shows a linear relationship
95% CI for r (Fisher's z transformed) = 0.171569 to between the variables unless 95% CI
0.483986 includes a zero (0) = no relationship.

t with 123 DF = 3.970017 • Correlation coefficient (r) always lies

Two sided P = 0.0001 between –1 to +1. Zero = no linear
Power (for 5% significance) = 97.12% relationship, minus = negative
Correlation coefficient is significantly different from zero
Question: Why does the analysis calculate a two-sided probability (P) rather than a one-sided P?

• Plot the regression for both the male and female data of height v PEFR


Do the 95% confidence intervals indicate that there is a linear relationship between height
and PEFR for the males and females?

What does the correlation coefficient tell us about the type of relationship? Is it a positive or a
negative correlation?

5. Calculating the PEFR of a given height on the regression line

• In the simple linear regression report for females, click on the button Interpolate x to y,
type in the height 162 and click Calculate.
• The predicted PEFR will appear in the output.
• Repeat this to calculate the predicted PEFR for males of the same height.

Do the 162cm tall males and females in the lung function practical have identical
predicted PEFR?
6. Are height and PEFR normally distributed?
• Use the Shapiro-Wilk test to check for normal distribution✸
• Click Analysis > Parametric > Shapiro-Wilk
• Highlight the column, click OK, and click OK to send output to the report.
• Check all your samples (male, female, height, PEFR) for evidence of non-normality

Question: Did any of the samples show evidence of non-normality?

Student’s t-test can be used for non-normal distributions ✹ Shapiro-Wilk cannot

The student’s t-test is very robust and can cope with non-normal tell you if a distribution
is normal; it can only
distributions for independent, unpaired samples unless there is a
indicate that a sample
significant difference in the variance of the two samples. does not have a
StatsDirect tests for this and gives an appropriate warning. normal distribution.

Questions: Look at your analysis of the difference in male and female PEFR.
1. Was there a significant difference in the variance?
2. If there was a difference, which test would StatsDirect suggest that you should
try instead of the t-test?
3. What is the type of test that you would use if you could not use the t-test?

C Comparing lung function before and after treatment

In your Asthma Treatment practical, the maximum PEFR values were

recorded before and after the use of a salbutamol inhaler with or without a
spacer, or a placebo.

The three groups have been coded:

- P: Placebo inhaler
- I: Salbutamol inhaler
- S: Salbutamol inhaler with spacer

1. Does Salbutamol improve the peak expiratory flow rate?

• Download and open the Salbutamol 2009.sdw file in >Blackboard > Discussions
> Lung function practical.

• Split the PEFR before data into the three groups. Do the same for the PEFR after data
and put them into the same worksheet.

• Choose an appropriate t-test to compare the difference before and after treatment,
and use it for each of the three groups. (After highlighting the first column, hold down the
control key and click on the second column.)
Your results should look something like this:
For differences between PEFR Before~Type=S and PEFR After~Type=S:
Mean of differences = -22.777778 (n = 36)
Standard deviation = 41.238755
Standard error = 6.873126

95% CI = -36.730965 to -8.824591

df = 35
t = -3.314035

One sided P = 0.0011

Two sided P = 0.0021

Is there any evidence that the salbutamol
had an effect on PEFR?
What is the effect of using the placebo?

2. Is there any difference in using salbutamol with or without the spacer?

• First, you must subtract the before results from the after results:
o Click Data > Apply Function and highlight the after column and then the before
o Type v1-v2 into the Apply function to data box.
o Do this for all three groups, P, S and I.

• Choose an appropriate t-test to compare the difference between salbutamol with spacer
and the salbutamol inhaler alone.

Is there any evidence for a difference between the salbutamol with spacer
or salbutamol inhaler alone?