An Investigation in To The Relationship Between Height, Weight, BMI and Obesity in Children

An investigation in to the relationship between
height, weight, BMI and obesity in children
Obese
Overweight
Hypothesis 1: The BMI will be less spread for Year 11 females than Year 11
males but will have a similar spread for Year 7 males and females.
Hypothesis 2: The BMI will be normally distributed
Hypothesis 3: The weights and heights of Years 7-11 males will be more strongly
correlated than that of Years 7-11 females.
Author: Joseph Cryan
Course: GCSE Statistics
Date: 9 April 2009
1
Abstract
This report investigates the relationship between weight, height, body mass index and obesity in
children. It does this by collecting primary data from Heckmondwike Grammar School students
to test the following three hypotheses:
Hypothesis 1: The BMI will be less spread for Year 11 females than Year 11 males but will
have a similar spread for Year 7 males and females.
This hypothesis was proposed because the girls have more media pressure to stay thin than boys
because of all of the models and actresses. (i.e. They are all chosen to look pretty.) The
hypothesis was tested by using box plots to look at the spread and working out the relative
spread. The relative spread for Year 11 females was greater than for Year 11 males and so the
hypothesis is not correct.
The BMI depends upon weight and height and it is well known that these tend to have a normal
distribution. As such, it is expected that the BMI will also have a normal distribution. The
hypothesis was tested by drawing the histograms for Year 7 Females and Males and Year 11
Females and Males and then working out the areas between ±σ and ±2σ . For a normal
distribution these should be 68% and 95%. All were close to this but Year 11 Males was not.
This is just a basic test and so a more accurate test was used by using the χ 2 test. This showed
that Year 7 Females and Males and Year 11 Females followed a normal distribution but Year 11
males did not. Since Year 11 Males do not have a normal distribution the hypothesis is not
correct.
Hypothesis 3: The weights and heights of Years 7-11 males will be more strongly correlated
than that of Years 7-11 females.
Because of the pressure on females to have smaller clothes size, it is expected that there will be
less correlation between weight and height for females than for males. To test this either the
Pearson or the Spearman coefficient can be used. To decide on which one, the weights and
heights of both Males and Females were checked to see if they follow a normal distribution since
to use Pearson they need to. The Pearson coefficient for Males rMales = 0.81 and for Females
rFemales = 0.665 . As the Males was larger there is a stronger correlation. This means the
hypothesis is supported.
As for the question, “Is obesity a problem at HGS?”, the percentage of overweight and
obese Year 7 and Year 11 males and females has been worked out and it is a lot less than
the national figures and so obesity is not a problem at HGS.
i
Table of contents
1 Aims, Design and Strategy..........................................................................................1

1.1 Aims.....................................................................................................................1
1.2 Design and strategy..............................................................................................3
1.2.1 Collecting data...............................................................................................3
1.2.2 Hypothesis 1: BMI spread.............................................................................5
1.2.3 Hypothesis 2: Normal distributions...............................................................5
1.2.4 Hypothesis 3: Correlation..............................................................................5
2 Collecting the data.......................................................................................................6
2.1 Types of data........................................................................................................6
2.2 Questionnaire design............................................................................................6
2.3 Potential problems................................................................................................7
2.4 Sampling options and avoiding bias ...................................................................8
2.5 Preliminary enquiry..............................................................................................9
3 Deciding on how many samples................................................................................11
3.1 Generating random samples...............................................................................11
3.2 Checking for outliers..........................................................................................14
4 Hypothesis 1: BMI spread ........................................................................................16
4.1 Justification for approach...................................................................................16
4.2 Testing the hypothesis........................................................................................16
4.3 Interpretation of results......................................................................................19
5 Hypothesis 2: BMI normally distributed...................................................................20
5.3.1 test statistic..................................................................................................29
6 Hypothesis 3: Correlation..........................................................................................35
6.3 Should Pearson or Spearman be used?...............................................................38
6.4 Calculating the Pearson coefficient....................................................................44
7 Is obesity a problem at HGS?....................................................................................46
8 Conclusions...............................................................................................................48
8.1 How the project could be improved...................................................................49
8.2 Limitations of the project...................................................................................50
9 Bibliography..............................................................................................................51
Appendix 1 – Huddersfield Examiner Article.............................................................52
Appendix 2 – The sampled data..................................................................................53
Appendix 3 – Chi-squared test....................................................................................62
Appendix 4 – Pearson coefficient...............................................................................63
....................................................................................................................................68
ii
1 Aims, Design and Strategy
1.1 Aims
This work aims to investigate the relationship between weight, height, Body Mass Index (BMI) and
obesity in children by using primary data from Heckmondwike Grammar School.
Obesity is when someone is dangerously overweight and it can lead to ill health. The Body Mass
Index is used to determine if someone is obese and this is worked out from the height and weight
values and is defined as
Weight (kg)
Body Mass Index =
[ height(m)]
2
The BMI for boys and girls are shown in Figures 1 and 2. Obese children are defined as those with
a BMI ≥ 95th centile and overweight children have a BMI ≥ 91st centile.
Obesity amongst children is becoming a major health problem. For example, an article (Appendix
1) published in the Huddersfield Daily Examiner indicates that obesity in children has tripled in
Kirklees.
Secondary statistics are available from the “Health Survey for England 2004: Updating of trend
tables to include childhood obesity data” and these are shown in Table 1.
Boys (11-15) Girls (11-15)

Overweight 12.8% 19.3%
Obese 24.2% 26.7%
Table 1 – Children’s (11-15) overweight and obesity prevalence 2004
The National Child Measurement Programme 2006/7 shows that, for Kirklees, 14.5% of Year 6
children are overweight and 16.8% are obese.
Heckmondwike Grammar School is within Kirklees and primary data will be collected from the
School and used to test three hypothesis relating to the weight, height and BMI and these are
summarised below along with the reasons for selecting them for investigation.
1
Figure 1 – Male BMI 5-18 years
Figure 2 – Female BMI 5-18 years

2
The three hypotheses to be tested are:
This hypothesis was proposed because as they get older the girls have more media pressure to stay
thin than boys because of all of the thin models used in advertising and because being thin is
considered attractive. This would make females stay thinner to look prettier, but as boys do not have
this pressure they do not try to stay thin as much. Because of this pressure, the spread for females
should be less as they get older.
distribution. As BMI is weight divided by the square of the height then this suggests that the BMI
will also be normally distributed.
Because of the pressure on females to look like models and have smaller clothes size, it is expected
that there will be less correlation between weight and height for females than for males.
1.2 Design and strategy
The software package Mindjet Mindmanager Pro was used to create a Mind Map to help in the
design and strategy for this project. The Mind Map is shown in Figure 3 and the key stages are
explained below.
1.2.1 Collecting data
Since the Body Mass Index is calculated from the weights and heights this project relies on
quantitative data. Both primary data and secondary data (from the internet) will be used. A
questionnaire will be designed for the primary data collection which will include the gender,
weight (m), height (kg) and Year group. This will be anonymous to encourage people to tell the
truth.
The different types of sampling techniques will be considered and the most appropriate for the
proposed hypothesis selected.
A preliminary enquiry will be carried out with a small sample to confirm the appropriateness of the
questionnaire and sampling approach. The primary data will be compared with secondary data to
check that the primary data and sample size is representative with known national statistics.
3
Figure 3 – Mind map of the statistics project
4
1.2.2 Hypothesis 1: BMI spread
Hypothesis 1 requires an investigation of the spread of BMI for Years 7 and 11 males and females.
This could be done by drawing the histograms and working out the standard deviations or drawing
the box and whisper diagrams and working out the interquartile range. The box and whisper
diagrams will allow the four sets of data to be easily compared on a single graph and will
immediately give a visual representation of the spread and so these will be used rather than the
histograms. Since the box and whisper diagrams require the calculations of the quartiles the inter-
quartile range will be used as a measure of spread and these will be compared for Year 7 and then
Year 11 to test the hypothesis.
1.2.3 Hypothesis 2: Normal distributions
There are many choices for presenting data including pictograms, bar-charts, choropleth maps, line
graphs, stem and leaf diagrams, frequency polygons, pie-charts, box and whisper diagrams and
histograms. However, Hypothesis 2 requires an investigation into the shape of the distribution to
see if it has the symmetrical bell-shaped curve that is associated with the normal distribution. A
further test is to work out the percentage of values that are within ±1 standard deviations, if it is a
normal distribution it should be 68% of the values and 95% should lie between ±2 standard
deviations. Since both the shape and the area are required then the best option for representing the
data is the histogram since this gives a visual representation and the area under the histogram can
fairly easily worked out, unlike a stem and leaf diagram. The standard deviation will be worked out
for each set of data and then the area under the histogram calculated to see if it follows a normal
distribution. If necessary, the more complicated Chi-squared test will be carried out to test if the
distributions are normal.
1.2.4 Hypothesis 3: Correlation
This requires an analysis of the correlation that exists between weight and height for Years 7-11
males and Years 7-11 females. A scatter diagram will be used to see if there is a connection
between the weights and heights and a line of best fit will be drawn. If there is positive correlation
the scatter graph will show that as the height increases so does the weight. If there is strong
correlation the points will lie close to the line of best fit. The degree of correlation will be
determined by working out the correlation coefficient. A correlation coefficient of zero would
mean no correlation and a correlation coefficient of one would mean strong positive correlation. A
decision will have to be made as to which correlation coefficient is going to be calculated. If the
distributions are normal then Pearson’s product-moment coefficient should be used. However, in
the case of non-normal distributions, Pearson’s will lead to the wrong results and it is also sensitive
to outliers. For non-normal distributions, Spearman’s rank correlation coefficient should be used
and so the shape of the distributions will need to be checked.
5
2 Collecting the data
2.1 Types of data
Qualitative data is data that does not have numerical values, for example, colours or months of the
year. Quantitative data is provided in numerical form for example, the price of things or
measurements such as temperature and speed. In this project, quantitative data will be used as the
values of height and weight will be used to calculate the Body Mass Index.
There are two types of data. These are primary data and secondary data.
Primary data is data that is collected completely by the user, and secondary data is information that
has already been collected and has simply been accessed for use. Rather than have all students
collecting primary data such that each student is asked their height and weight at least 150 times,
HGS has collected one set of ‘primary’ data to be used specifically for the statistics project and so
this can be considered as ‘primary data for the project team’. The secondary data that will be used
is that that has been compiled nationally and that is available over the internet, that is, data that has
not been collected by the ‘project team’. This will be used to see if the primary data is
representative of the national data.
2.2 Questionnaire design
Questions should never be biased, if a question is a Leading Question it can persuade people that
there is a right answer and a wrong answer, which can end up giving you biased results. For
example, ‘do you agree that red is a nice colour’, Questionnaires are often multiple choice, if this is
the case, there should be no gaps left in answers, there can often be mistakes because of this.
Questions should never embarrass or upset people, if there is a need to ask a sensitive question then
a promise of confidentiality or two questions in which one answer can mean two different things
should be provided. Questions should always be easy to understand, because if someone
misunderstands the question then the data received can be different to that required. All questions
should have some relevance to the survey.
The questionnaire that was used to produce the data for this project is shown in Figure 4.
GCSE Statistics Data Collection
This data collectionFigure

sheet is? designed to be anonymous.
– Questionnaire Please
for collecting fill in the details as
the data
accurately as possible. You do not have to fill in your weight if you do not want to.
Please remove your shoes before being measured and weighed. Fold the completed
slip and pass to your teacher. Thank you.
Year 10
Please circle: Male Female
Height: m
Weight: kg
Figure 4 – Questionnaire used to collect the height and weight data
6
2.3 Potential problems
The types of problems that might happen in the data collection are:
(a) Not enough data collected because people refuse to fill in the questionnaire.
(b) Missing data where people either accidentally or deliberately do not fill in the questionnaire
correctly.
(c) Incorrect data where people accidently or deliberately enter the wrong information
(d) Data entry errors where the data has been incorrectly entered into Exel.
As the questionnaire is anonymous it is hoped the people will fill it in and a decision will be made
on whether or not there are enough returns once the data has been collected. If there are not,
thought will be given as to how to collect more data by for example catching people at lunchtime or
in the play ground. The data will also be checked for missing information and outliers both of
which will be deleted to give a clean data set. A preliminary enquiry will be carried out on a small
sample and comparisons of mean heights, weights and BMI will be made against national
secondary statistics to check the validity of the approach and to ensure the type of data collected is
realistic and worth using.
7
2.4 Sampling options and avoiding bias
There are a number of sampling options available and it is important to select an approach that
avoids bias. The options are considered below:
Quota sampling is simply an interviewer asking certain people questions. They could be of a
certain social class, gender or age. The interviewer, however, chooses who to ask. This is cheap
but ineffective, this is because some people may choose to avoid the interviewer, biasing the sample
one way, this is why this method has not been used here.
Cluster sampling is dividing the population into groups, random sampling is then used to choose
the groups, this is ineffective because the groups may be of a specific type of person, again biasing
the survey, that is why this method has not been used.
Opinion sampling is using open ended questions to find out how people feel about something,
however this is irrelevant so is not going to be used. All that is of interest is actual height and
weight.
In systematic sampling, everyone in the sample is chosen at regular intervals from the list. This can
be biased if low or high values are in a regular pattern, for this reason this method has not been
used.
Convenience sampling is simply someone standing somewhere asking a certain amount of people
certain questions. This is flawed because people of similar social groups who would give similar
answers and would probably be at the place where the interviewer is, therefore biasing the sample,
therefore this method of sampling will not be used. For example, standing outside a sweet shop or
standing outside a gym would give completely different answers for weight values.
Stratified sampling ensures that when a population contains separate groups or strata, each group is
fairly represented in the sample. In this case, there are 5 year groups and samples are being taken
from both males and females and so there are 10 strata. In stratified sampling, the number taken
from each strata is proportional to the strata size and this ensures that all strata are fairly
represented.
Random sampling ensures that every member of the population has an equal chance of being
selected which removes bias and so random sampling will be used.
Having considered different sampling options, stratified and random sampling have been selected as
the most suitable for this project because it ensures that each strata is fairly represented and each
member of a strata has an equal chance of being selected and so there is no bias in the strata.
8
2.5 Preliminary enquiry
Secondary data on the weight, height and BMI for 2 to 15 year olds is available in the publication
‘Health Survey for England 2007 Latest Trends’ available from the NHS Information Centre for
Health and Social Care. Table 2 summarises this secondary data for 15 year old males and females
for survey year 2007. By comparing a preliminary enquiry sample to this the appropriateness of
both the questionnaire and the samples can be checked.
15 year old Males 15 year old Females

Mean Height 1.728 m 1.623 m
Mean Weight 63.6 kg 59.1 kg
Mean BMI 21.3 22.4
Table 2 – National survey statistics 2007
To check the appropriateness of the questionnaire and the available data a preliminary enquiry was
carried out on a sample of 10 Year 10 males and 10 Year 10 females and the results are shown in
Table 3. The BMI was calculated using
Weight (kg)
Body Mass Index =
[ height(m)]
2
For example, for the first female the height is 1.64 m and the weight is 61 kg and so the BMI is:
Weight (kg) 61
Body Mass Index = = = 22.7
[ Height (m)]
2
1.642
Table 4 compares the preliminary enquiry results with the secondary data of Table 2. All of the
preliminary enquiry results are slightly less than the secondary data with the maximum difference
being -2.7%. Since Year 10 is a mix of 14 and 15 years olds and the secondary data is for 15 year
olds only then it would be expected that the preliminary enquiry results are lower. The preliminary
enquiry results show that the questionnaire is appropriate in gathering the information required on
height and weight such that the BMI can be calculated. The preliminary enquiry samples compare
well with the secondary data which gives some confidence in the validity of the information
collected but it is not conclusive because it is just a small sample and it could be a result of
fortunate sampling.
9
Year Group Gender Height (m) Weight (kg) BMI
10 f 1.64 61 22.7
10 f 1.61 56.5 21.8
10 f 1.67 57 20.4
10 f 1.65 60 22.0
10 f 1.69 66 23.1
10 f 1.59 58 22.9
10 f 1.65 56 20.6
10 f 1.62 50 19.1
10 f 1.46 56 26.3
10 f 1.55 55 22.9
Mean = 1.61 57.6 22.2
10 m 1.77 73 23.3
10 m 1.66 64 23.2
10 m 1.69 68 23.8
10 m 1.73 53 17.7
10 m 1.65 63 23.1
10 m 1.67 54 19.4
10 m 1.66 57.5 20.9
10 m 1.73 54 18.0
10 m 1.8 65 20.1
10 m 1.79 67 20.9
Mean = 1.72 61.9 21.1
Table 3 – Preliminary enquiry BMI calculations for Year 11 Males and Females
Males Females
Secondary Preliminary % Secondary Preliminary %
data enquiry difference data enquiry difference
data data
Mean Height (m) 1.728 1.72 -0.5% 1.623 1.61 -0.8%
Mean Weight (kg) 63.6 61.9 -2.7% 59.1 57.6 -2.5%
Mean BMI 21.3 21.1 -1% 22.4 22.2 -0.9%
Table 4 – Comparison of secondary and primary preliminary enquiry data
10
3 Deciding on how many samples
The size of the samples is important because if too few are used then this could lead to inaccurate
results because the will not be representative of the population. Using too many samples means
unnecessary calculations.
The breakdown of the overall population data that is available is shown in Table 5.
Female Male Total

Year 7 69 79 148
Year 8 63 85 148
Year 9 66 77 143
Year 10 72 71 143
Year 11 66 82 148
Total: 336 394 730
Table 5 – The whole population
The overall population is 730 and so a stratified sample of 300 will be taken. This represents
approximately 40% and is a good balance between taking too few and taking too many and should
be representative of the whole population.
The stratified sample is shown in Table 6.
Female Male Total

Year 7 28 32 60
Year 8 26 35 61
Year 9 27 32 59
Year 10 30 29 59
Year 11 27 34 61
Total: 138 162 300
Table 6 – Stratified sample
3.1 Generating random samples
Excel was used to generate the random samples using the following approach:
1. Firstly a random number was generated using the RAND() function to give a random
number greater than or equal to 0 and less than 1 (eg 0.5910).
2. This was then multiplied by the total population (eg for Year 7 Female this would be
69x0.5910 = 40.78
3. The ROUNDUP(number,0) function was used to roundup to the nearest integer so, for
example, ROUNDUP(40.78,0) would give 41.
4. This was repeated until the desired number of samples was achieved (eg for Year 7 Female
this would be 28)
5. The corresponding values were then taken from the population to give the sampled subset.
11
Random Nos. between 0 and 1 Random Nos. between 1 and 69 Sampling Nos
1 0.7981 55.07 56
2 0.2723 18.79 19
3 0.4442 30.65 31
4 0.4304 29.69 30
5 0.7794 53.78 54
6 0.0370 2.56 3
7 0.0033 0.23 1
8 0.9662 66.67 67
9 0.5410 37.33 38
10 0.4047 27.92 28
11 0.0861 5.94 6
12 0.6128 42.28 43
13 0.7647 52.76 53
14 0.3026 20.88 21
15 0.8527 58.83 59
16 0.6026 41.58 42
17 0.7051 48.65 49
18 0.3393 23.41 24
19 0.9764 67.37 68
20 0.5613 38.73 39
21 0.1816 12.53 13
22 0.2891 19.95 20
23 0.0214 1.47 2
24 0.1645 11.35 12
25 0.4139 28.56 29
26 0.9334 64.40 65
27 0.0623 4.30 5
28 0.3149 21.72 22
Table 7 – Generating random samples
Table 7 shows the random numbers used and Table 8 shows the samples used which are highlighted
in green.
12
Weight
Number Year Group Gender Height (m) (kg) Number Year Group Gender Height (m) Weight (kg)
1 7 f 1.62 44.5 36 7 F 1.45 37
2 7 f 1.53 37 7 F 1.43 32
3 7 f 1.535 54 38 7 F 1.4 34
4 7 f 1.58 55 39 7 F 1.52 43
5 7 f 1.57 51 40 7 F 1.55 41
6 7 f 1.48 44.5 41 7 F 1.47 46
7 7 f 1.535 51 42 7 F 1.52 42
8 7 f 1.6 46.5 43 7 F 1.44 37
9 7 f 1.63 44 7 F 1.65 49
10 7 f 1.46 45 7 F 1.5 48
11 7 f 1.52 46 7 F 1.66 66
12 7 f 1.53 38 47 7 F 1.57 72
13 7 f 1.41 37.5 48 7 F 1.61 50
14 7 F 1.57 48 49 7 F 1.54 61
15 7 F 1.43 39 50 7 F 1.44 42
16 7 F 1.56 42 51 7 F 1.64 68
17 7 F 1.44 37 52 7 F 1.58 47
18 7 F 1.38 29 53 7 F 1.5 41
19 7 F 1.68 56 54 7 F 1.6 37
20 7 F 1.54 50 55 7 F 1.48 38
21 7 F 1.38 27 56 7 F 1.52 49
22 7 F 1.6 47 57 7 F 1.52 44
23 7 F 1.52 39 58 7 f 1.61 44.5
24 7 F 1.53 48 59 7 f 1.51
25 7 F 1.51 36 60 7 f 1.515 51
26 7 F 1.62 42 61 7 f 1.54 51
27 7 F 1.55 45 62 7 f 1.56 50.5
28 7 F 1.56 63 7 f 1.46 41.5
29 7 F 1.51 42 64 7 f 1.655 52
30 7 F 1.51 42 65 7 f 1.65 44.5
31 7 F 1.52 39 66 7 f 1.65
32 7 F 1.59 45 67 7 f 1.485
33 7 F 1.53 43 68 7 f 1.535 425
34 7 F 1.58 65 69 7 f 1.525 37.5
35 7 F 1.46 39
Table 8 – The sampled subset for Year 7 Females
A similar approach was taken for the rest of the stratified sample and the results are shown in
Appendix 2.
13
3.2 Checking for outliers
Some of the samples included non-responses for either height or weight and so these were
eliminated and replaced by other samples.
To check for outliers the interquartile range must be found, it must then be multiplied by 1.5 and
anything above the upper quartile or below the lower quartile by that amount or more is an outlier.
Excel can be used to calculate the interquartile range and the upper and lower outlier values. This
has been done for all of the sampled data and the results are shown in Tables 9-12.
Year 7 Year 8 Year 9 Year 10 Year 11

Minimum 1.380 1.050 1.540 1.460 1.550
LQ 1.528 1.580 1.585 1.590 1.585
Median 1.525 1.610 1.600 1.625 1.650
UQ 1.548 1.650 1.650 1.670 1.690
Maximum 1.680 1.740 1.740 1.830 1.740
IQR 0.020 0.070 0.065 0.080 0.105
Lower outlier 1.498 1.475 1.488 1.470 1.428
Upper outlier 1.578 1.755 1.748 1.790 1.848
Table 9 – Female Heights

Minimum 1.380 1.430 1.310 1.370 1.640
LQ 1.450 1.515 1.610 1.650 1.738
Median 1.510 1.580 1.670 1.730 1.790
UQ 1.558 1.645 1.713 1.790 1.838
Maximum 1.760 1.720 1.880 1.880 5.700
IQR 0.108 0.130 0.103 0.140 0.1000
Lower outlier 1.289 1.320 1.456 1.440 1.588
Upper outlier 1.719 1.840 1.866 2.000 1.988
Table 10 – Male Heights

Minimum 27.00 33.00 38.50 46.00 44.50
LQ 38.75 43.50 46.00 48.25 50.50
Median 42.75 49.50 50.00 55.50 56.00
UQ 48.25 60.50 59.75 58.50 60.25
Maximum 61.00 75.50 67.50 85.00 87.00
IQR 9.5 17.00 13.75 10.25 9.75
Lower outlier 24.5 18.00 23.375 32.875 38.875
Upper outlier 62.5 86.00 80.375 73.875 74.875
Table 11 – Female Weights
14
Minimum 30.00 34.00 40.00 32.00 52.00
LQ 39.00 41.50 52.25 54.00 61.50
Median 44.00 48.50 60.00 62.00 66.00
UQ 50.50 57.00 67.75 66.00 73.38
Maximum 88.00 81.00 85.00 95.00 179.00
IQR 11.50 15.50 15.50 12.00 11.875
Lower outlier 21.75 18.25 29.00 36.00 43.688
Upper outlier 67.75 80.25 91.00 84.00 91.188
Table 12 – Male Weights
The highlighted values show the outliers and some of these are silly mistakes. For example, the
height of 5.7 m for a year 11 boy was probably a height entered in feet and inches rather than
metres. The weight of 179 kg for a Year 11 female was probably a data entry error and should be
79 kg. The highlighted outliers were removed as were any other data items that were significantly
outside the lower and upper outlier range.
This left a clean set of data that can be worked with.
15
4 Hypothesis 1: BMI spread
4.1 Justification for approach
The hypothesis is:
Hypothesis 1: The BMI will be less spread for Year 11 females than Year 11 males but the spread
will be similar for Year 7 males and females.
This requires an investigation of spread for Year 7 and Year 11 males and females. There are
several choices for presenting the data and there are a number of measures of spread and so an
approach must be selected and justified.
With respect to presenting the data, pie-charts are inappropriate because they do not give a visual
impression of spread. Both stem and leaf diagrams and histograms could be used since they give a
visual representation but box plots are preferred because they use actual values rather than grouped
data and they give an immediate impression of spread because of the width of the box and the width
of the whiskers. Box and whisper diagrams also make it easier to compare the four sets of data on
the same graph and to immediately get a feel for the relative spread between the data sets.
Spread could be measured using the variance or standard deviation but since box plots are being
used to visualise the data it makes sense to use the inter-quartile range as the measure of spread
since the values have already been calculated to draw the box plot. This also has the benefit of
reducing the impact of extreme values.
4.2 Testing the hypothesis
This hypothesis will be tested using box plots and relative spread. Hand calculations will be done
first to show the technique and then Autograph will be used to make the calculations simpler.
To show the calculations, the BMI data for Year 11 females will be used. The data has been
arranged into ascending order such that the box plot information can be found. This is shown in
Table 13.
From Table 13
Minimum = 16.6
Maximum = 29.4
n +1
For n data values, gives the position of the median. There are 27 values and so the median is
2
at position 14 and so the median is:
Median = 21
n +1
The lower quartile is at which in this case is position 7 and so the lower quartile is:
4
16
Lower quartile = 18.7
3 ( n + 1)
The upper quartile is at the position which in this case is 21st position and so
4
Upper quartile = 22.7
Yr 11 F
BMI
1 16.6
2 17.5
3 17.6
4 18.1
5 18.2
6 18.4
7 18.7
8 19.6
9 19.7
10 19.9
11 20.6
12 20.7
13 20.8
14 21.0
15 21.0
16 21.0
17 21.2
18 22.2
19 22.4
20 22.5
21 22.7
22 22.8
23 24.7
24 25.4
25 26.4
26 26.9
27 29.4
Table 13 – Year 11 Female BMI
Using this information, the box plot can now be drawn. Autograph automatically calculates these
values from the raw data and so Autograph will be used to create box plots for Year 7 and 11 males
and females. These are shown in Figure 5.
17
Figure 5 – Box plots to test the spread of data
Figure 6 – Box plots using UK secondary data
18
4.3 Interpretation of results
The box plots of Figure 5 show that, for Year 7 the median BMI for females is 18.4 and for males it
is 18.9 and the interquartile ranges are 3.1 for females and 3.5 for males but that the females have a
greater range (whiskers showing minimum to maximum). Both Year 7 box plots are positively
skewed since their medians are closer to the lower quartile.
For Year 11, the median for the females is 21 and the interquartile range is 3.5, for males they are
lower with the median being 20.6 and the interquartile range 2.9. This suggests that the hypothesis
might not be supported because the Year 11 females have a higher interquartile range than the
males. However, spread is measured in terms of relative spread with respect to the median and this
is different. For Year 11, the female data is negatively skewed (median nearer upper quartile) and
the male data is positively skewed.
Figure 6 shows the box plots using secondary data taken from the child BMI charts shown in
Figures 1 and 2. The median for Year 7 males is less than that for females and the interquartile
range is less. For Year 11, the median for females is larger than males and the interquartile range is
larger. This does not support the hypothesis.
The relative spread is given by
interquartile range
Relative spread = ×100%
median
Calculating the relative spread for each set of data gives the results shown in Table 14:
HGS data National data

Year 7 Female 18.5% 16.8%
Year 7 Male 19% 16.3
Year 11 Female 18.8% 17.1%
Year 11 Male 15.3% 16%
Table 14 – Comparing relative spread using primary and secondary data
The relative spread for Year 7 males and females is very close and so supports the hypothesis.
The relative spread for females in Year 11 is more than that for males both for the HGS data and for
the national data and so Hypothesis 1 is not supported.
19
5 Hypothesis 2: BMI normally distributed
The second hypothesis is:
Hypothesis 2: The BMI for HGS students will be normally distributed
The normal distribution is given by
1  ( x − µ)2 
p ( x) = exp  − 
σ 2π  2σ 2 

Where µ is the mean and σ is the standard deviation. Autograph has been used to draw the
normal distribution when µ = 0 and σ = 1 and the result is shown in Figure 7.
Figure 7 – The normal distribution
The normal distribution is a bell shaped curve and so in deciding how to present the data the
particular format used should be able to demonstrate the shape of the distribution. This rules out
using pie-charts but a stem and leaf diagram would provide a quick visual check of the shape.
However, a more thorough check is to work out the area under the distribution and compare it to
that for a normal distribution and to do this means that a histogram must be used.
20
Using the ‘Find area’ function on Autograph, the area under a curve can be found. This was done
to find the area between ± 1 standard deviations ( ±σ = ±1 ) and this is shown in Figure 8. The area
was found to be 0.683. As the total area under the normal distribution is 1, then this is 68.3%
This was done for ±2σ = ± 2 and this is shown in Figure 9. The area was found to be 0.955. As the
total area under the normal distribution is 1, then this is 95.%
Figure 8 – Area under normal distribution between ±σ
Figure 9 – Area under normal distribution between ±2σ
If the BMI for HGS students follows a normal distribution then the histogram should have a normal
looking shape and the areas under the histograms should be 68% between ±σ and 95.5% between
±2σ and so for each case the standard deviation needs to be calculated.
21
This hypothesis will be tested by drawing the BMI histogram for HGS males and females for both
Year 7 and Year 11 and checking each to see if it is normally distributed.
As an example, hand calculations will be done for Year 11 Females and then Autograph will be
used to make the calculations easier for the others. The raw data for Year 11 Females is shown in
Table 15.
Sample Yr 11 F BMI Sample Yr 11 F BMI

1 21.02 15 22.55
2 19.73 16 19.89
3 18.73 17 18.37
4 21.23 18 21.01
5 22.77 19 26.37
6 18.17 20 22.21
7 16.63 21 18.05
8 17.51 22 29.41
9 20.80 23 20.69
10 22.68 24 19.60
11 20.64 25 22.43
12 25.43 26 26.89
13 17.63 27 24.65
14 21.01
Table 15 – Raw data for Year 11 Female BMI
From the raw data of Table 15, a frequency table was made and this is shown in Table 16.
BMI Frequency
16 ≤ x < 18 3
18 ≤ x < 20 7
20 ≤ x < 22 7
22 ≤ x < 24 5
24 ≤ x < 26 2
26 ≤ x < 28 2
28 ≤ x < 30 1
Table 16 – Frequency Table for Year 11 Female BMI
Using the frequency table, a histogram was hand drawn and this is shown in Figure 10.
22
Figure 10 – insert hand drawn histogram.
23
To check if the histogram is like a normal curve the mean and standard deviation are needed. Excel
was used to calculate these using:
mean = x =
∑x
n
And
∑x ∑x
2 2
Standard deviation = σ = −  
n  n 
x x2
21.02 441.83
19.73 389.14
18.73 350.83
21.23 450.62
22.77 518.62
18.17 330.01
16.63 276.59
17.51 306.47
20.80 432.50
22.68 514.38
20.64 425.80
25.43 646.81
17.63 310.85
21.01 441.32
22.55 508.35
19.89 395.67
18.37 337.29
21.01 441.32
26.37 695.39
22.21 493.12
18.05 325.93
29.41 864.82
20.69 428.08
19.60 384.02
22.43 503.21
26.89 723.20
24.65 607.86
2
Σx = 576.08 Σx = 12544.04
Table 17 – Data for determining mean and standard deviation
Using the data in Table 17, the mean and standard deviation are given by
mean = x =
∑ x = 576.08 = 21.33
n 27
∑x ∑x
2 2 2
12544.04  576.08 
Standard deviation = σ = −   = −   = 3.06
n  n  27  27 
24
If the histogram follows a normal curve then 68% of the values should lie between a BMI of
21.33 ± 3.06 . This has been marked on Figure 10 and the shaded area worked out:
Shaded area = (20-(21.33-3.06))x7 + 2x7 + 2x5 + (21.33+3.06-24)x2 = 36.9
Total area = 2x3 + 2x7 +2x7 + 2x5 +2x2 +2x2 + 1x2 = 54
The percentage of values = shaded area/total area = 36.9/54 = 68.3%
If the histogram follows a normal curve then 95% of the values should lie between 21.33 ± 2 × 3.06 .
This has been marked on Figure 11 and the shaded area worked out:
Shaded area = 2x3 +2x7 + 2x7 + 2x5 +2x2 + (21.3+2x3.06 – 26)x2 = 50.84
Total area = 2x3 + 2x7 +2x7 + 2x5 +2x2 +2x2 + 1x2 = 54
The percentage of values = shaded area/total area = 50.84/54 = 94%
This suggests that the BMI distribution for Year 10 Females at HGS follows a normal curve.
25
72.2% are between
± 1 standard deviation
95.2% are between

Figure 11 – Histograms for Year 7 Female BMI
26
62.2% are between
96% are between

Figure 12 – Histograms for Year 7 Male BMI
27
74.1% are between
91.7% are between

Figure 13 – Histograms for Year 11 Male BMI
28
The process was repeated using Autograph and the results are shown in Figures 11-13.
Figure 11 shows the histogram for Year 7 female BMI and visually it has the bell shape associated
with a normal distribution. The area under the histogram between ±σ is 72.2% and between ±2σ it
is 95.2% which compare very well with the 68% and 95.5% expected for a normal distribution and
so this supports the hypothesis that the BMI is normally distributed.
Figure 12 shows the histogram for Year 7 male BMI and visually it is not quite the bell shape
associated with a normal distribution since it has positive skew. The area under the histogram
between ±σ is 62.2% which represents a notable difference from the expected 68% and reflects the
positive skew. Between ±2σ it is 96% which reflects compares well with the expected 95.5% for a
normal distribution. These results do not totally support the hypothesis and suggest that a more
rigorous test is required.
Figure 13 shows the histogram for Year 11 male BMI and visually it does not have the bell shape
associated with a normal distribution. The area under the histogram between ±σ is 74.1% and
between ±2σ it is 91.7% which do not compare very well with the 68% and 95.5% expected for a
normal distribution and so this does not supports the hypothesis that the BMI is normally
distributed.
Overall, the results suggest that the Year 7 female BMI and the Year 11 female BMI are normally
distributed, the Year 7 male BMI could be but the Year 11male BMI is not. The tests carried out
are basic and so it suggests that a more rigorous test is required and so a the χ 2 statistic will be
calculated to give a more accurate test.
5.3.1 χ 2 test statistic
A more accurate test to see if the distribution is normal is the χ 2 test statistic that is described in
Appendix 3. The χ 2 statistic is:
( O − E)
2
χ =∑
2
Where O is the observed value and E is the expected value. The chi-squared statistic is a measure
of the difference between the observed values and the expected values and can be used to see how
close the distribution is to an expected distribution.
In this case, the test is to see whether or not it is a normal distribution and so the null and alternative
hypothesis are:
H 0 : BMI follows a normal distribution

H1 : BMI does not follow a normal distribution
29
As the normal distribution is being tested, then the mean and standard deviation are required to
determine the expected values. From Appendix 3, the number of degrees of freedom is k − p − 1
where k is the number of classes and p is the number of parameters estimated from the sample
data used to generate the hypothesised distribution. In this case, p = 2 because the mean and
standard deviation are required to determine a normal distribution.
The normal distribution is given by:
1  ( x − µ)2 
p ( x) = exp  − 
σ 2π  2σ 2 

Autograph was used to find the area under the curve. For Year 11 Females, µ = 21.33 and
σ = 3.06 and so the expected frequency in a particular class can be found by multiplying the
number of samples by the probability of the samples being in that class. The probability is the area
under the normal distribution curve which can be found using Autograph. For example, the
probability of BMI samples being between 20-22 for Year 11 Females is shown in Figure 14.
Figure 14 – Probability that BMI will be between 20-22 for Year 11 Females
The Expected frequency is the number of samples multiplied by the probability, 27x0.255 = 6.88.
This was repeated for each of the classes and the results are shown in Table 18 along with the
Observed frequencies. The Expected can now be subtracted from the Observed to determine the
Chi-squared statistic.
30
Body Mass Index
14-16 16-18 18-20 20-22 22-24 24-26 26-28 28-30
Expected 0.88 2.63 5.23 6.88 5.99 3.46 1.32 0.33
Observed 0 3 7 7 5 2 2 1
(O-E)2/E 0.88 0.05 0.60 0.00 0.16 0.61 0.35 1.34
Σ(O-E)2/E = 3.99 Critical Value = 11.07
Table 18 - χ 2 test for Year 11 Female BMI
For Year 11 Females, the χ 2 = 3.99 . The smaller that this figure is, the closer the observed is to the
expected statistic. Chi-squared tables are used to work out the significance and to do this, the
number of degrees of freedom is needed. In this case there are eight classes and so k = 8 , the
number of degrees of freedom is k − p − 1 = 8 − 2 − 1 = 5 . The Chi-squared critical values are shown
in Table 19 for the 0.05 significance level.
Degrees of Freedom Critical value

1 3.84
2 5.99
3 7.82
4 9.49
5 11.07
6 12.59
7 14.07
8 15.51
9 16.92
10 18.31
Table 19 – Chi-squared critical values at the 0.05 significance level
If the χ 2 value is less than the critical value then the null hypothesis is not rejected. In this case,
when the number of degrees of freedom is 5, the critical value is 11.07 and so the null hypothesis is
not rejected.
This was repeated for Year 11 Males and Year 7 Females and Males with the results shown in
Tables 20-22.
Body Mass Index

16-18 18-20 20-22 22-24 24-26 26-28 28-30
Bin 2 Bin 3 Bin 4 Bin 5 Bin 6 Bin 7 Bin 8
Expected 3.44 6.68 8.41 6.86 3.62 1.24 0.27
Observed 3 10 11 4 1 1 2
(O-E)2/E 0.06 1.65 0.80 1.19 1.90 0.05 11.08
Table 20 - χ 2 test for Year 11 Male BMI
31
Body Mass Index
12-14 14-16 16-18 18-20 20-22 22-24 24-26 26-28
Expected 0.34 1.97 5.66 8.04 5.66 1.97 0.34 0.02
Observed 0 2 6 9 5 1 1 0
(O-E)2/E 0.34 0.00 0.02 0.11 0.08 0.48 1.31 0.02
Table 21 - χ 2 test for Year 7 Female BMI
Body Mass Index

12-14 14-16 16-18 18-20 20-22 22-24 24-26
Expected 0.15 1.42 5.77 10.32 8.09 2.78 0.41
Observed 0 1 7 10 7 4 0
(O-E)2/E 0.15 0.12 0.26 0.01 0.15 0.53 0.41
Table 22 - χ 2 test for Year 7 Male BMI
Except for Year 11 Male, the χ 2 value was less than the critical value and so the null hypothesis
cannot be rejected which suggests that the BMI follows the normal distribution.
For Year 11 Male, χ 2 = 16.72 which is more than the critical value of 9.49 and so the null
hypothesis is rejected and the BMI cannot be said to follow a normal distribution. This could be a
result of a rogue sample and so the distribution of all of the Year 11 Male data was checked. The
histogram is shown in Figure 15.
32
Figure 15 – Histogram for Year 11 Male BMI using whole population
This Year 11 Male BMI histogram again has a large upper tail and so visually it does not look as if
it follows a normal distribution but this can be checked more accurately using the Chi-squared test.
The results are shown in Table 23.
16-18 18-20 20-22 22-24 24-26 26-28 28-30

Bin 2 Bin 3 Bin 4 Bin 5 Bin 6 Bin 7 Bin 8
Expected 7 14.15 19.25 17.65 10.9 4.54 1.27
Observed 6 22 23 14 3 5 5
(O-E)2/E 0.14 4.35 0.73 0.75 5.73 0.05 10.96
Σ(O-E)2/E = 22.71 Critical value = 9.49
Table 23 - χ 2 test for Year 7 Male BMI whole population
For Year 11 Male whole population, χ 2 = 22.71 which is more than the critical value of 9.49 and
so the null hypothesis is rejected and the BMI cannot be said to follow a normal distribution.
Figure 16 shows the difference between the expected and the observed and it is clear that there are
large differences and that the distribution is not normal.
33
Figure 16 – The Expected and Observed histograms for Year 11 Male BMI
The BMI for Year 7 Males and Females and for Year 11 Females follows a normal distribution.
However, the BMI for Year 11 males does not and so Hypothesis 2 is rejected.
34
6 Hypothesis 3: Correlation
The third hypothesis is:
Hypothesis 3: The weights and heights of males will be more strongly correlated than that of
females.
This requires establishing a connection between the height and weight data for both males and
females. This can be done visually by using a scatter diagram since, if there is a connection, the
weight should increase as the height increases. Once the scatter diagram is drawn a line of best fit
can be constructed and if the points are close to this it suggests strong positive correlation and so the
two scatter diagrams will give an immediate visual representation of whether there is a stronger
correlation between male weights and heights or female weights and heights.
The correlation coefficient gives a numerical indication as to the strength of the correlation. A
correlation coefficient of zero would mean no correlation and a correlation coefficient of one would
mean strong positive correlation and so if the hypothesis is correct the correlation coefficient for
male weights and heights should be closer to one than that for female weights and heights. The
method for calculating the correlation coefficient depends upon the shape of the distribution. If it is
a normal distribution then Pearson’s method is used, if it is a non-normal distribution then
Spearman’s method is used and so prior to deciding on which method a test will have to be carried
out to check the shape of the distribution and this is done in section 6.3.
Figure 17 shows a hand drawn scatter graph for the weights and heights of Years 7-11 males using
half of the sample set (every other point was used). Visually, there looks to be a positive correlation
since the weight appears to increase with height. Assuming a linear relationship, a line of best fit
can be estimated and drawn through the (mean height, mean weight) point and the gradient and
intercept calculated.
Mean height for Males Years 7-11 = 1.65 m

Mean weight for Males Years 7-11 = 56.4 kg
Using this point, the line of best fit was drawn and this is shown in Figure 17.
The line has the equation:
w = mh + c
Where w is the weight (kg), h is the height (m), m is the gradient and c is the intercept. The
gradient can be found by dividing the change in weight for a given change in height:
change in weight  72.5 − 40 

m= =  = 65
change in height  1.9 − 1.4 
35
The intercept can be found by selecting a particular (weight, height) point on the graph. Using
(40,1.4)
40 = 65 ×1.4 + c
c = 40 − 65 × 1.4 = −51
The equation of the line that relates weight to height in Years 7-11 males is
wMale = 65hMale − 51
Autograph can be used to do this automatically by using the ‘y on x regression line’. This has been
done for the weight versus height scatter graph fro the Years 7-11 females and the result is shown in
Figure 18. For the females, the weight is related to the height by:
wFemale = 90hFemale − 92
The gradient for the females is larger than the males, which means that weight increases more
rapidly with increases in height.
Visually, the data points appear more scattered around the line of best fit for females than it does for
males and so this suggests a weaker correlation for females. This supports the hypothesis but will
be checked more rigorously by working out the correlation coefficient.
36
Figure 17 Insert Hand drawn scatter graph here:
37
Figure 18 – Scatter graph of weight versus height for HGS females Years 7-11
6.3 Should Pearson or Spearman be used?
The Pearson product-moment correlation coefficient (Appendix 4) is a measure of the correlation

between two variables. It is calculated from
n∑ xy − ∑ x ∑ y
r=
( n ∑ x − ( ∑ x ) ) ( n∑ y − ( ∑ y ) )
2 2 2 2
It reflects the degree of linear relationship between two variables. It ranges from +1 to -1. A
correlation of +1 means that there is a perfect positive relationship between the two variables. A
correlation of -1 means that there is a perfect negative relationship. A correlation of 0 means that
there is no linear relationship between the two variables. Pearson’s coefficient requires the two
variables to be normally distributed. In the case of non-normal distributions , Pearson’s correlation
coefficient will lead to wrong results. Also, Pearson’s coefficient is sensitive to outliers.
In the case of non-normal distributions, Spearman’s rank correlation can be used. Spearman’s
coefficient is given by
6∑ d 2
ρ = 1−
(
n n2 − 1 )
38
Basically, it differs from Pearson’s correlation only in that the values are converted to ranks before
computing the coefficient. This has the advantage of reducing the effect of outliers. The
disadvantage of Spearman’s is that it is time consuming to rank the data when there is a lot of data.
To make a decision on whether to use Pearson or Spearman the data needs to be checked to see if it
follows a normal distribution. To do this, Chi-squared tests will be carried out on the following
hypotheses:
Male Heights
H 0 : Male heights follow a normal distribution
H1 : Male heights do not follow a normal distribution
Male Weights:
H 0 : Male weights follow a normal distribution
H1 : Male weights do not follow a normal distribution
Female Heights
H 0 : Female heights follow a normal distribution
H1 : Female heights do not follow a normal distribution
Female Weights:
H 0 : Female weights follow a normal distribution
H1 : Female weights do not follow a normal distribution
Autograph was used to get a histogram of the Year 7-11 heights and weights and the
mean ± 3 standard deviation tool was used to put markers on the histogram that give a feel for the
distribution. Figure 19 shows the results and the distributions look like normal distributions.
39
Figure 19 – Male heights and weights for Yrs 7-11
40
Male Heights (m)
1.3-4 1.4-1.5 1.5-1.6 1.6-1.7 1.7-1.8 1.8-1.9 1.9-2.0
Expected 3.04 13.44 33.51 47.20 37.56 16.89 4.28
Observed 1 20 27 46 40 17 4
(O-E)2/E 1.37 3.20 1.26 0.03 0.16 0.00 0.02
Σ(O-E)2/E = 6.04
Table 24 – Chi-squared test for male heights
Table 24 shows the results of the Chi-squared test for male heights. The number of degrees of
freedom are ( k − p − 1) where k = 7 is the number of classes, p is the number of parameters
estimated from the sample data used to generate the hypothesised distribution which in this case is 2
( µ , σ ). From Table 16, the critical valued for (7 − 2 − 1) = 4 degrees of freedom is 9.49. As the
Chi-squared test gave 6.06<9.49 then the null hypothesis is not rejected and it can be assumed the
male heights follow a normal distribution.
Male Weights (kg)

30-40 40-50 50-60 60-70 70-80 80-90 90-100
Expected 11.83 30.49 45.28 38.79 19.17 5.46 0.89
Observed 15 34 40 43 15 7 1
(O-E)2/E 0.85 0.40 0.62 0.46 0.91 0.43 0.01
Σ(O-E)2/E = 3.68
Table 25 – Chi-squared test for male weights
Table 25 shows the results of the Chi-squared test for male weights. From Table 16, the critical
valued for (7 − 2 − 1) = 4 degrees of freedom is 9.49. As the Chi-squared test gave 3.68<9.49 then
the null hypothesis is not rejected and it can be assumed the male weights follow a normal
distribution.
Figure 20 shows the distributions of the female heights and weights for Years 7-11. The female
heights show a negative skew (mean<mode) and the female weights show a positive skew
(mode<mean). Visually, they both look to have normal type distributions but this will be checked
using the Chi-squared test.
41
Figure 20 – Female heights and weights for Yrs 7-11
42
Female Heights (m)
1.3-1.4 1.4-1.5 1.5-1.6 1.6-1.7 1.7-1.8 1.8-1.9
Expected 0.44 9.40 47.28 57.58 17.08 1.20
Observed 1 7 45 68 10 2
(O-E)2/E 0.73 0.61 0.11 1.89 2.93 0.53
Σ(O-E)2/E = 6.81
Table 26 – Chi-squared test for female heights
Table 26 shows the results of the Chi-squared test for female heights. From Table 16, the critical
the null hypothesis is not rejected and it can be assumed the female heights follow a normal
distribution.
Female Weights (kg)

20-30 30-40 40-50 50-60 60-70 70-80 80-90
Expected 1.97 13.49 38.93 47.69 24.81 5.46 0.51
Observed 1 8 53 43 21 5 1
(O-E)2/E 0.48 2.23 5.09 0.46 0.59 0.00 0.00
Σ(O-E)2/E = 8.37
Table 27 – Chi-squared test for female weights
Table 27 shows the results of the Chi-squared test for female weights. From Table 16, the critical
the null hypothesis is not rejected and it can be assumed the male weights follow a normal
distribution.
As each of the Chi-squared tests show that the weights and heights of Years 7 to 11 males and
females can be assumed to follow a normal distribution then the Pearson coefficient is the most
appropriate.
43
6.4 Calculating the Pearson coefficient
The Pearson product-moment correlation coefficient is calculated from
n∑ xy − ∑ x ∑ y
r=
( n∑ x − ( ∑ x ) ) ( n∑ y − ( ∑ y ) )
2 2 2 2
By letting the height = x and the weight = y , Excel was used to calculate the various summations.
For the males, these are:
Σx= 254.7
Σy= 8684
Σx2= 423.7
Σy2= 513718
Σxy= 14559.1
N= 154
The Pearson coefficient for the males is given by:
154 ×14559.1 − 254.7 × 8684

rMales =
( 154 × 423.7 − ( 254.7 ) ) ( 154 × 513718 − ( 8684) )
2 2
30286.4
=
377.7 × 3700716
30286.4
=
37386.6
= 0.81
For the females,
Σx= 207.17
Σy= 6749.74
Σx2= 333.4173
Σy2= 365923.38
Σxy= 10903.052
N= 129
The Pearson coefficient for the females is given by:
44
129 ×10903 − 207.2 × 6749.7
rFemales =
( 129 × 333.42 − ( 207.2) ) ( 129 × 365923.4 − ( 6749.7 ) )
2 2
8150
=
91.42 ×1645126.5
8150
=
12263.7
= 0.665
The scatter diagrams for both males (Figure 17) and females (Figure 18) show positive correlation
and so suggest that the weight increases as the height increases. Lines of best fit were calculated
and are given by:
wMale = 65hMale − 51
wFemale = 90hFemale − 92
The gradient for the female weight height relationship is greater than that of the male weight height
relationship which means that the weight increases at a more rapid rate with height for females than
it does for males.
Chi-squared tests were carried out to check whether or not the distributions of weights and heights
were normally distributed. The results of the Chi-squared tests supported in each case that the
distributions were normal and so the Pearson correlation coefficient was used rather than
Spearman’s since Spearman’s is used for non-normal distributions.
The Pearson coefficient for males is rMales = 0.81 and the Pearson coefficient for females is
rFemales = 0.665 . Both are greater than 0.5 and so there is strong positive correlation between the
weights and heights. Since rMales > rFemales it can be concluded that the weights and heights of males
are more strongly correlated than the weights and heights of females and this supports Hypothesis
3.
45
7 Is obesity a problem at HGS?
Using the charts of Figures 1 and 2, the BMI for being obese and overweight can be found. For
Year 7, the average age is assumed to be 11.5 years, for Year 8 it will be 12.5 and so on. From the
charts, Table 28 was produced.
Overweight Obese
Year 7 Female 21.6 24.3
Year 7 Male 23 25.3
Year 11 Female 21.6 23
Year 11 Male 23.5 26.4
Table 28 – Overweight and obesity BMI thresholds
Using the values in Table 28, Autograph can be used to find the area under the histogram and so
work out the percentage of HGS students that are overweight and obese. Figure 21 shows an
example of this for Year 11 Males. By working out the area under the histogram for BMI greater
than 26.4 then the percentage of obese males can be found.
8.8% obese
Figure 21 – Using Autograph to work out the percentage of obese Males for Year 11
46
Overweight Obese
Year 7 Female 9% 3.5%
Year 7 Male 11.7% 6.9%
Year 11 Female 11.5% 5.6%
Year 11 Male 6.9% 8.8%
Table 29 – Overweight and obesity BMI thresholds
This was done for Year 7 Males and Females and Year 11 Females and the results are shown in
Table 29. These percentages are a lot less than the national figures shown in Table 1 and so it can
be concluded that overweight/obesity is not a problem at HGS.
47
8 Conclusions
This report investigated the relationship between weight, height, body mass index and obesity in
children. It did this by making use of primary data collected from Year 7 to Year 11 students at
Heckmondwike Grammar School.
Section 2 looked at collecting data, types of data, questionnaire design and sampling options. A
stratified sample using random sampling was used in this report. This meant equal chances for
being picked for each of the Year 7 to Year 11 students.
Section 3 looked at how many samples to take and how to get rid of outliers. Taking too few
samples will give wrong information, taking too many means more calculations and so 40% of the
population was used. Excel was used to generate random numbers to take the samples. Excel was
also used to find outliers by working out the interquartile range, multiplying it by 1.5 and finding
out if any samples were more than this beyond the upper and lower quartile. In some cases the data
was missing (eg missing weight data), in some cases the data wasn’t correct (eg decimal point
missing or wrong data). The data was cleaned up in section 3.
Section 4 looked at Hypothesis 1
This hypothesis was proposed because the girls have more media pressure to stay thin than boys
because of all of the models and actresses. (i.e. They are all chosen to look pretty.) The hypothesis
was tested by using box plots to look at the spread and working out the relative spread. The relative
spread for Year 7 Female and Year 7 Male was 18.5% and 19% respectively and so were similar.
However, the relative spread for Year 11 Female was 18.8% which was significantly larger than the
relative spread of 15.3% for Year 11 Male but the hypothesis proposed that it would be smaller and
so the hypothesis is not correct.
Section 5 looked at Hypothesis 2.
distribution. As such, it is expected that the BMI will also have a normal distribution. The
hypothesis was tested by drawing the histograms for Year 7 Females and Males and Year 11
Females and Males and then working out the areas between ±σ and ±2σ . For a normal
distribution these should be 68% and 95%. All were close to this but Year 11 Males was not. This
is just a basic test and so a more accurate test was used by using the χ 2 test. This showed that Year
7 Females and Males and Year 11 Females followed a normal distribution but Year 11 males did
not. Since Year 11 Males do not have a normal distribution the hypothesis is not correct.
48
Section 6 looked at Hypothesis 3.
Because of the pressure on females to have smaller clothes size, it is expected that there will be less
correlation between weight and height for females than for males. Scatter diagrams were drawn to
test this and both suggested positive correlation which suggested that the weight for both males and
females increases as the height increases. A line of best fit was drawn for each case and it was
found that the points appeared to be closer to the line of best fit for the males than the females
suggesting a stronger correlation. To test the correlation either the Pearson or the Spearman
coefficient can be used. To decide on which one, the weights and heights of both Males and
Females were checked to see if they follow a normal distribution since to use Pearson they need to.
As the distributions were normal, the Pearson coefficient was calculated and for Males rMales = 0.81
and for Females rFemales = 0.665 . As the Males was larger there is a stronger correlation. This
means the hypothesis is supported.
As for the question, “Is obesity a problem at HGS?”, the percentage of overweight and obese Year 7
and Year 11 males and females has been worked out and it is a lot less than the national figures and
so obesity is not a problem at HGS.
8.1 How the project could be improved
The first improvement is the design of the questionnaire. As BMI is being used in this project, a
disadvantage of the questionnaire is that it gives the student the option of not completing their
weight. Without this, the BMI cannot be calculated. The other problem with the questionnaire is it
does not indicate the accuracy required for the height and weight and this can lead to inaccuracies in
the BMI calculation. For example, if someone has a height of 1.64 m and a weight of 57.6 kg then
their BMI would be
57.6
BMI= = 21.42
1.642
But, if on the questionnaire they rounded down their height to 1.6 m and their weight up to 58 kg
the BMI would be calculated as 22.67 which is an overestimate of the BMI by 5.5%. An alternative
questionnaire that might lead to better data is shown below.
GCSE Statistics Data Collection
Figure ? – Questionnaire for collecting the data

This data collection sheet is designed to be anonymous. Please fill in the details as
accurately as possible. Please remove your shoes before being measured and
weighed. Fold the completed slip and pass to your teacher. Thank you.
Year 10
Please circle: Male Female
Height: m (to the nearest cm eg 1.48 m)
Weight: kg (to the nearest 100 g eg 54.3 kg)
49
This questionnaire underlines anonymous and so students know that they can be honest about their
weight. It also gives examples so it will guide students to give more accurate height and weight
information.
The second improvement is the way the second hypothesis was worded. Currently it is:
This was ambiguous and could be interpreted in a number of ways. Does it mean all students?
Does it mean students in each year? What was meant was that the distribution in each year for each
group of males and females would follow a normal distribution and that is why tests were carried
out for Year 7 males and females and Year 11 males and females because they represented the
extreme of the age ranges. Because of the wording, an alternative test would have been to just
group the whole of the data for males and females of all of the years together and see if it was a
normal distribution and this would have been valid because of the way the hypothesis is worded. A
better wording would be:
Hypothesis 2: The BMI for both male and female students in each year group will be
normally distributed
8.2 Limitations of the project
Each of the hypotheses was tested based on taking a 40% sample of the relevant population of the
730 students at HGS. Although this was considered to be enough to be representative of the whole
population of 730 students it is possible that a different 40% sample could lead to different results.
The samples were taken from Heckmondwike Grammar School and may not be representative of
the national population. If the samples were taken from a different School the results could be
completely different because of the make up of that School. For example, Heckmondwike
Grammar School takes students from the top 5% attainment levels on entry and only has 3% of
students on school meals. Fartown High School attainment levels on entry are below the national
average, 50% are on school meals and 35% are special educational needs. If the samples were
taken from Fartown High School, the results could be completely different.
The University of Washington (source: ScienceDaily.com 29 August 2007) has shown that
geography determines obesity. Researchers found that in the Seattle Metropolitan area, obesity
levels reached 30% in the most deprived areas but were only around 5% in the most affluent. This
suggests that taking samples from Schools who take their students from deprived areas could lead to
completely different results from Schools who take their students from affluent areas. This could
also explain why, in section 7, the figures show that obesity is not a problem at HGS.
50
9 Bibliography
1. Pledger, K., Cole, G., Jolly, P., Newman, G., Petran, J. & Bright, S. (2006). Edexcel GCSE
Mathematics, Heinemann
2. Job, B. & Morley, D. (2003), Key Maths GCSE - Statistics AQA Version, Nelson Thornes
3. Zip codes and property values predict obesity rates,
http://www.sciencedaily.com/releases/2007/08/070829090143.htm
4. Health Survey for England 2004: Updating of trend tables to include childhood obesity data
http://www.ic.nhs.uk/pubs/hsechildobesityupdate
51
Appendix 1 – Huddersfield Examiner Article
52
Appendix 2 – The sampled data
Year 7:
Number Random Sample Year Group Gender Height (m) Weight (kg) BMI
1 1 7 f 1.62 44.5 17.0
2 3 7 f 1.535 54 22.9
3 5 7 f 1.57 51 20.7
4 6 7 f 1.48 44.5 20.3
5 12 7 f 1.53 38 16.2
6 13 7 f 1.41 37.5 18.9
7 19 7 F 1.68 56 19.8
8 20 7 F 1.54 50 21.1
9 21 7 F 1.38 27 14.2
10 22 7 F 1.6 47 18.4
11 24 7 F 1.53 48 20.5
12 29 7 F 1.51 42 18.4
13 30 7 F 1.51 42 18.4
14 31 7 F 1.52 39 16.9
15 38 7 F 1.4 34 17.3
16 39 7 F 1.52 43 18.6
17 42 7 F 1.52 42 18.2
18 43 7 F 1.44 37 17.8
19 49 7 F 1.54 61 25.7
20 53 7 F 1.5 41 18.2
21 54 7 F 1.6 37 14.5
22 56 7 F 1.52 49 21.2
23 65 7 f 1.65 44.5 16.3
24 68 7 f 1.535 42.5 18.0
1 1 7 m 1.63 52 19.6
2 4 7 m 1.545 54.5 22.8
3 6 7 m 1.38 30 15.8
4 7 7 m 1.54 50.5 21.3
5 10 7 m 1.45 39 18.5
6 12 7 m 1.54 53.5 22.6
7 13 7 m 1.56 42.5 17.5
8 15 7 m 1.56 53.5 22.0
9 16 7 m 1.47 50.5 23.4
10 17 7 M 1.41 45 22.6
11 19 7 M 1.5 49 21.8
12 21 7 M 1.49 42 18.9
13 24 7 M 1.52 43 18.6
14 25 7 M 1.48 39 17.8
15 29 7 M 1.41 40 20.1
16 31 7 M 1.5 48 21.3
17 32 7 M 1.61 49 18.9
18 36 7 M 1.55 48 20.0
19 42 7 M 1.52 42 18.2
20 44 7 M 1.48 36 16.4
21 45 7 M 1.58 48 19.2
22 47 7 M 1.61 54 20.8
23 51 7 M 1.45 37 17.6
25 53 7 M 1.43 38 18.6
26 55 7 M 1.41 32 16.1
27 56 7 M 1.44 35 16.9
28 57 7 M 1.7 61 21.1
29 58 7 M 1.53 43 18.4
53
30 62 7 M 1.43 34 16.6
Year 7: Female = 24 mean h = 1.526667SD = 0.1

Male = 30 mean h = 1.507759SD = 0.1
Total = 54
Height Weight BMI
Female: Min= 1.380 27.000 14.18
LQ = 1.528 38.750 17.25
Median= 1.525 42.750 18.39
UQ = 1.548 48.250 20.36
Maximum= 1.680 61.000 25.72
IQR = 0.020 9.500 3.11
Lout = 1.498 24.500 12.58
Uout = 1.578 62.500 25.03
Height Weight BMI

Male: Min= 1.380 30.000 15.8
LQ = 1.450 39.000 17.8
Median= 1.500 43.000 18.9
UQ = 1.550 50.500 21.3
Maximum= 1.700 61.000 23.4
IQR = 0.100 11.500 3.5
Lout = 1.300 21.750 12.6
Uout = 1.700 67.750 26.5
Outliers:
24 52 7 M 1.76 88 28.4
Year 8:
Random
Number Sample Year Group Gender Height (m) Weight (kg) BMI
1 1 8F 1.65 69 25.3
2 3 8F 1.67 61.5 22.1
3 4 8F 1.74 65 21.5
4 10 8F 1.67 67 24.0
5 11 8F 1.6 45 17.6
6 14 8F 1.64 46 17.1
7 19 8F 1.58 33 13.2
8 24 8F 1.54 43 18.1
9 27 8F 1.62 42 16.0
10 30 8F 1.59 48 19.0
11 32 8F 1.65 49.5 18.2
12 34 8F 1.58 50 20.0
13 35 8F 1.59 56.5 22.3
14 38 8F 1.55 49.5 20.6
16 43 8f 1.69 57 20.0
17 44 8f 1.66 61.5 22.3
18 46 8f 1.61 51 19.7
19 48 8f 1.625 75.5 28.6
20 50 8f 1.59 41 16.2
21 51 8f 1.47 43 19.9
22 53 8F 1.65 46 16.9
54
23 54 8F 1.55 43.5 18.1
24 57 8F 1.53 42 17.9
25 59 8F 1.65 56.5 20.8
1 1 8M 1.59 39 15.4
2 2 8M 1.45 40 19.0
3 5 8M 1.7 70 24.2
4 9 8M 1.54 42 17.7
5 11 8M 1.72 71 24.0
6 15 8M 1.58 45.5 18.2
7 16 8M 1.65 51 18.7
8 19 8M 1.53 43 18.4
9 20 8M 1.68 68 24.1
10 21 8M 1.44 35.5 17.1
11 22 8M 1.7 81 28.0
12 23 8M 1.72 57.5 19.4
13 25 8M 1.63 48.5 18.3
14 28 8M 1.57 42 17.0
15 29 8M 1.45 37 17.6
16 34 8M 1.62 45 17.1
17 38 8M 1.57 52 21.1
18 40 8M 1.56 54 22.2
19 45 8M 1.55 52 21.6
20 47 8M 1.63 49 18.4
21 50 8m 1.61 55 21.2
22 55 8m 1.495 51 22.8
23 56 8m 1.45 34 16.2
24 57 8m 1.45 47 22.4
25 58 8m 1.62 63 24.0
26 60 8m 1.55 39.5 16.4
27 67 8M 1.71 57 19.5
28 68 8M 1.5 48 21.3
29 70 8M 1.64 57 21.2
30 74 8M 1.48 40 18.3
31 76 8M 1.7 62 21.5
32 78 8M 1.63 46 17.3
33 80 8M 1.69 61.5 21.5
34 81 8M 1.53 41 17.5
35 85 8M 1.43 37 18.1

Male = 35 mean h = 1.5788333SD = 0.1
Total = 60
Height Weight BMI
Female: Min= 1.470 33.000 13.219
LQ = 1.580 43.375 17.851
Median= 1.615 49.500 19.787
UQ = 1.650 58.125 21.615
Maximum= 1.740 75.500 28.592
IQR = 0.070 14.750 3.76
Lout = 1.475 21.250 12.21
Uout = 1.755 80.250 27.26
Height Weight BMI

Male: Min= 1.430 34.000 15.427
55
LQ = 1.515 41.500 17.654
Median= 1.580 48.500 19.025
UQ = 1.645 57.000 21.588
Maximum= 1.720 81.000 28.028
IQR = 0.130 15.500 3.9
Lout = 1.320 18.250 11.8
Uout = 1.840 80.250 27.5
Outliers:
15 42 8f 1.05 60.5 54.9
Year 9:
Year
Number Random Sample Group Gender Height (m) Weight (kg) BMI
1 3 9F 1.54 41.5 17.5
2 4 9F 1.67 66.5 23.8
3 5 9F 1.6 48 18.8
4 6 9F 1.74 64.5 21.3
5 8 9F 1.6 46 18.0
6 11 9F 1.6 43 16.8
7 12 9F 1.58 50 20.0
8 13 9F 1.59 55 21.8
9 18 9F 1.62 57.5 21.9
10 19 9F 1.63 60.1 22.6
11 22 9F 1.62 44.5 17.0
12 23 9F 1.63 50 18.8
13 24 9F 1.64 46 17.1
14 25 9F 1.6 52 20.3
15 26 9F 1.6 48.5 18.9
16 37 9F 1.7 57.5 19.9
17 38 9F 1.65 67.5 24.8
18 46 9F 1.57 38.5 15.6
19 49 9F 1.57 48 19.5
20 50 9F 1.66 67.5 24.5
21 51 9F 1.55 47 19.6
22 52 9F 1.65 60 22.0
23 55 9f 1.71 59.5 20.3
24 57 9f 1.6 51.5 20.1
25 60 9f 1.65 65 23.9
26 61 9f 1.56 42 17.3
27 63 9f 1.56 41 16.8
1 3 9M 1.69 53.5 18.7
2 9 9M 1.73 71 23.7
3 14 9M 1.59 52.5 20.8
4 20 9M 1.65 50 18.4
5 21 9M 1.6 44 17.2
6 23 9M 1.72 60.5 20.5
7 24 9M 1.66 58.5 21.2
8 26 9M 1.65 55.5 20.4
9 27 9M 1.7 70 24.2
10 28 9M 1.71 75 25.6
11 29 9M 1.77 71.5 22.8
12 33 9M 1.6 56 21.9
13 35 9M 1.57 60 24.3
56
14 36 9M 1.68 61.5 21.8
15 37 9M 1.79 85 26.5
16 39 9M 1.67 50 17.9
17 41 9M 1.51 40 17.5
18 42 9M 1.69 52 18.2
19 47 9M 1.86 80.5 23.3
20 49 9M 1.62 47 17.9
21 50 9M 1.7 64 22.1
22 51 9M 1.6 57 22.3
24 54 9M 1.67 62 22.2
25 55 9M 1.7 49.5 17.1
26 57 9M 1.55 46 19.1
27 62 9M 1.88 61 17.3
28 66 9m 1.72 67 22.6
29 68 9m 1.73 72 24.1
30 70 9m 1.65 60 22.0
31 74 9m 1.67 54 19.4
32 76 9m 1.64 61 22.7

Male = 32 mean h = 1.674138SD = 0.0831
Total = 59
Height Weight BMI
Female: Min= 1.540 38.500 15.619
LQ = 1.585 46.000 17.734
Median= 1.600 50.000 19.896
UQ = 1.650 59.750 21.833
Maximum= 1.740 67.500 24.793
IQR = 0.065 13.750 4.10
Lout = 1.488 25.375 11.59
Uout = 1.748 80.375 27.98
Height Weight BMI

Male: Min= 1.510 40.000 17.128
LQ = 1.625 52.125 18.561
Median= 1.670 60.000 21.790
UQ = 1.715 65.500 22.751
Maximum= 1.880 85.000 26.529
IQR = 0.090 13.375 4.2
Lout = 1.490 32.063 12.3
Uout = 1.850 85.563 29.0
Outliers:
23 52 9M 1.31 75 43.7
Year 10:
1 2 10f 1.64 61 22.7
2 3 10f 1.6 47 18.4
3 4 10f 1.55 46 19.1
4 6 10f 1.61 56.5 21.8
5 7 10f 1.55 46 19.1
57
6 9 10f 1.67 57 20.4
7 12 10f 1.68 52.5 18.6
8 17 10f 1.65 60 22.0
9 18 10f 1.7 69 23.9
10 19 10f 1.69 66 23.1
11 20 10f 1.59 58 22.9
12 22 10f 1.65 56 20.6
13 23 10f 1.79 57 17.8
14 24 10f 1.67 46 16.5
15 28 10f 1.615 48.5 18.6
16 34 10f 1.83 79 23.6
17 38 10f 1.62 50 19.1
18 40 10f 1.62 54 20.6
19 41 10f 1.46 56 26.3
20 43 10f 1.51 52 22.8
21 45 10f 1.55 55 22.9
22 46 10f 1.53 47 20.1
23 48 10f 1.63 48.5 18.3
24 50 10f 1.62 57.5 21.9
25 52 10f 1.63 46 17.3
26 53 10f 1.65 70.14 25.8
27 58 10f 1.59 47.5 18.8
1 5 10m 1.62 44 16.8
3 10 10m 1.79 66 20.6
4 11 10m 1.77 73 23.3
5 12 10m 1.63 46 17.3
6 13 10m 1.76 55 17.8
7 14 10m 1.79 56 17.5
8 18 10m 1.63 56 21.1
9 20 10m 1.66 64 23.2
10 25 10m 1.82 71 21.4
11 26 10m 1.77 65 20.7
12 28 10m 1.69 68 23.8
13 34 10m 1.73 53 17.7
14 37 10m 1.8 65 20.1
15 38 10m 1.745 49.5 16.3
16 42 10m 1.65 63 23.1
17 43 10m 1.67 54 19.4
18 44 10m 1.62 54 20.6
19 45 10m 1.6 46 18.0
20 46 10m 1.81 62 18.9
23 55 10m 1.67 62 22.2
24 56 10m 1.67 60 21.5
25 60 10m 1.66 57.5 20.9
26 62 10m 1.88 81 22.9
27 64 10m 1.73 54 18.0
28 67 10m 1.8 65 20.1
29 71 10m 1.79 67 20.9

Male = 29 mean h = 1.721346SD = 0.0774
Total = 57
Height Weight BMI
Female: Min= 1.460 46.000 16.494
LQ = 1.590 48.250 18.742
Median= 1.625 55.500 20.573
58
UQ = 1.670 58.500 22.905
Maximum= 1.830 79.000 29.412
IQR = 0.080 10.250 4.16
Lout = 1.470 32.875 12.50
Uout = 1.790 73.875 29.15
Height Weight BMI

Male: Min= 1.600 44.000 16.256
LQ = 1.660 54.000 17.987
Median= 1.730 61.000 20.587
UQ = 1.790 65.000 21.494
Maximum= 1.880 81.000 23.809
IQR = 0.130 11.000 3.5
Lout = 1.465 37.500 12.7
Uout = 1.985 81.500 26.8
Outliers:
2 6 10m 1.51 32 14.0

28 61 10f 1.7 85 29.4
22 53 10m 1.37 67 35.7
21 51 10m 1.83 95 28.4
Year 11:
1 5 11 F 1.55 50.5 21.0
2 6 11 F 1.6 50.5 19.7
3 8 11 F 1.55 45 18.7
4 11 11 F 1.55 51 21.2
5 13 11 F 1.65 62 22.8
6 17 11 F 1.74 55 18.2
7 23 11 F 1.69 47.5 16.6
8 24 11 F 1.69 50 17.5
9 25 11 F 1.67 58 20.8
10 26 11 F 1.64 61 22.7
11 27 11 F 1.64 55.5 20.6
12 29 11 F 1.7 73.5 25.4
13 30 11 F 1.65 48 17.6
14 31 11 F 1.69 60 21.0
15 32 11 F 1.59 57 22.5
16 38 11 F 1.64 53.5 19.9
17 39 11 F 1.65 50 18.4
18 42 11 F 1.69 60 21.0
19 45 11 F 1.57 65 26.4
20 49 11 F 1.63 59 22.2
21 50 11 F 1.57 44.5 18.1
22 53 11 F 1.72 87 29.4
23 56 11 F 1.71 60.5 20.7
24 58 11 F 1.66 54 19.6
25 62 11 F 1.58 56 22.4
26 63 11 F 1.67 75 26.9
27 65 11 F 1.56 60 24.7
2 4 11 M 1.84 62 18.3
59
3 7 11 M 1.95 65 17.1
4 8 11 M 1.9 67 18.6
5 9 11 M 1.8 67 20.7
6 10 11 M 1.78 69.5 21.9
7 16 11 M 1.68 80 28.3
8 17 11 M 1.77 60 19.2
9 27 11 M 1.76 83 26.8
10 28 11 M 1.71 52 17.8
11 30 11 M 1.83 66 19.7
12 34 11 M 1.64 76 28.3
13 37 11 M 1.8 70 21.6
14 38 11 M 1.9 61 16.9
15 40 11 M 1.66 61.5 22.3
16 41 11 M 1.78 61.5 19.4
17 42 11 M 1.65 59 21.7
18 45 11 M 1.83 68 20.3
19 46 11 M 1.73 57 19.0
21 51 11 M 1.71 59 20.2
22 53 11 M 1.79 81 25.3
23 55 11 M 1.81 64.5 19.7
24 59 11 M 1.8 73.5 22.7
25 60 11 M 1.86 73 21.1
26 62 11 M 1.7 62 21.5
27 63 11 M 1.78 73.5 23.2
28 64 11 M 1.78 61.5 19.4
29 65 11 M 1.79 65.5 20.4
30 68 11 M 1.79 59 18.4
31 73 11 M 1.68 64 22.7
32 74 11 M 1.9 77.5 21.5
33 79 11 M 1.88 66 18.7
34 82 11 1.77 68 21.7

Male = 34 mean h = 1.782813SD = 0.0782
Total = 61
Height Weight BMI
Female: Min= 1.550 44.500 16.631
LQ = 1.585 50.500 19.163
Median= 1.650 56.000 21.008
UQ = 1.690 60.250 22.613
Maximum= 1.740 87.000 29.408
IQR = 0.105 9.750 3.45
Lout = 1.428 35.875 13.99
Uout = 1.848 74.875 27.79
Height Weight BMI

Male: Min= 1.640 52.000 16.898
LQ = 1.725 61.500 19.125
Median= 1.785 65.750 20.561
UQ = 1.830 70.750 22.031
Maximum= 1.950 83.000 28.345
IQR = 0.105 9.250 2.9
Lout = 1.568 47.625 14.8
Uout = 1.988 84.625 26.4
60
Outlier:
20 48 11 M 5.7 179 5.5

1 1 11 M 1.85 120 35.1
61
Appendix 3 – Chi-squared test
62
Appendix 4 – Pearson coefficient
HGS Male Pearson calculation:

x y
Height (m) Weight (kg) x2 y2 xy
1.38 30 1.9044 900 41.4
1.41 45 1.9881 2025 63.45
1.41 40 1.9881 1600 56.4
1.41 32 1.9881 1024 45.12
1.43 38 2.0449 1444 54.34
1.43 37 2.0449 1369 52.91
1.43 34 2.0449 1156 48.62
1.44 35.5 2.0736 1260.25 51.12
1.44 35 2.0736 1225 50.4
1.45 47 2.1025 2209 68.15
1.45 40 2.1025 1600 58
1.45 39 2.1025 1521 56.55
1.45 37 2.1025 1369 53.65
1.45 37 2.1025 1369 53.65
1.45 34 2.1025 1156 49.3
1.47 50.5 2.1609 2550.25 74.235
1.48 40 2.1904 1600 59.2
1.48 39 2.1904 1521 57.72
1.48 36 2.1904 1296 53.28
1.49 42 2.2201 1764 62.58
1.495 51 2.235025 2601 76.245
1.5 49 2.25 2401 73.5
1.5 48 2.25 2304 72
1.5 48 2.25 2304 72
1.51 40 2.2801 1600 60.4
1.52 43 2.3104 1849 65.36
1.52 42 2.3104 1764 63.84
1.53 43 2.3409 1849 65.79
1.53 43 2.3409 1849 65.79
1.53 41 2.3409 1681 62.73
1.54 53.5 2.3716 2862.25 82.39
1.54 50.5 2.3716 2550.25 77.77
1.54 42 2.3716 1764 64.68
1.545 54.5 2.387025 2970.25 84.2025
1.55 52 2.4025 2704 80.6
1.55 48 2.4025 2304 74.4
1.55 46 2.4025 2116 71.3
1.55 39.5 2.4025 1560.25 61.225
1.56 54 2.4336 2916 84.24
1.56 53.5 2.4336 2862.25 83.46
1.56 42.5 2.4336 1806.25 66.3
1.57 60 2.4649 3600 94.2
1.57 52 2.4649 2704 81.64
1.57 42 2.4649 1764 65.94
1.58 48 2.4964 2304 75.84
1.58 45.5 2.4964 2070.25 71.89
1.59 52.5 2.5281 2756.25 83.475
63
1.59 39 2.5281 1521 62.01
1.6 57 2.56 3249 91.2
1.6 56 2.56 3136 89.6
1.6 46 2.56 2116 73.6
1.6 44 2.56 1936 70.4
1.61 55 2.5921 3025 88.55
1.61 54 2.5921 2916 86.94
1.61 49 2.5921 2401 78.89
1.62 63 2.6244 3969 102.06
1.62 54 2.6244 2916 87.48
1.62 47 2.6244 2209 76.14
1.62 45 2.6244 2025 72.9
1.62 44 2.6244 1936 71.28
1.63 56 2.6569 3136 91.28
1.63 52 2.6569 2704 84.76
1.63 49 2.6569 2401 79.87
1.63 48.5 2.6569 2352.25 79.055
1.63 46 2.6569 2116 74.98
1.63 46 2.6569 2116 74.98
1.64 76 2.6896 5776 124.64
1.64 61 2.6896 3721 100.04
1.64 57 2.6896 3249 93.48
1.65 63 2.7225 3969 103.95
1.65 60 2.7225 3600 99
1.65 59 2.7225 3481 97.35
1.65 55.5 2.7225 3080.25 91.575
1.65 51 2.7225 2601 84.15
1.65 50 2.7225 2500 82.5
1.66 64 2.7556 4096 106.24
1.66 61.5 2.7556 3782.25 102.09
1.66 58.5 2.7556 3422.25 97.11
1.66 57.5 2.7556 3306.25 95.45
1.67 62 2.7889 3844 103.54
1.67 62 2.7889 3844 103.54
1.67 60 2.7889 3600 100.2
1.67 54 2.7889 2916 90.18
1.67 54 2.7889 2916 90.18
1.67 50 2.7889 2500 83.5
1.68 80 2.8224 6400 134.4
1.68 68 2.8224 4624 114.24
1.68 64 2.8224 4096 107.52
1.68 61.5 2.8224 3782.25 103.32
1.69 68 2.8561 4624 114.92
1.69 61.5 2.8561 3782.25 103.935
1.69 53.5 2.8561 2862.25 90.415
1.69 52 2.8561 2704 87.88
1.7 81 2.89 6561 137.7
1.7 70 2.89 4900 119
1.7 70 2.89 4900 119
1.7 64 2.89 4096 108.8
1.7 62 2.89 3844 105.4
1.7 62 2.89 3844 105.4
1.7 61 2.89 3721 103.7
1.7 49.5 2.89 2450.25 84.15
1.71 75 2.9241 5625 128.25
1.71 59 2.9241 3481 100.89
1.71 57 2.9241 3249 97.47
64
1.71 52 2.9241 2704 88.92
1.72 71 2.9584 5041 122.12
1.72 67 2.9584 4489 115.24
1.72 60.5 2.9584 3660.25 104.06
1.72 57.5 2.9584 3306.25 98.9
1.73 72 2.9929 5184 124.56
1.73 71 2.9929 5041 122.83
1.73 57 2.9929 3249 98.61
1.73 54 2.9929 2916 93.42
1.73 53 2.9929 2809 91.69
1.745 49.5 3.045025 2450.25 86.3775
1.76 83 3.0976 6889 146.08
1.76 55 3.0976 3025 96.8
1.77 73 3.1329 5329 129.21
1.77 71.5 3.1329 5112.25 126.555
1.77 68 3.1329 4624 120.36
1.77 65 3.1329 4225 115.05
1.77 60 3.1329 3600 106.2
1.78 73.5 3.1684 5402.25 130.83
1.78 69.5 3.1684 4830.25 123.71
1.78 61.5 3.1684 3782.25 109.47
1.78 61.5 3.1684 3782.25 109.47
1.79 85 3.2041 7225 152.15
1.79 81 3.2041 6561 144.99
1.79 67 3.2041 4489 119.93
1.79 66 3.2041 4356 118.14
1.79 65.5 3.2041 4290.25 117.245
1.79 59 3.2041 3481 105.61
1.79 56 3.2041 3136 100.24
1.8 73.5 3.24 5402.25 132.3
1.8 70 3.24 4900 126
1.8 67 3.24 4489 120.6
1.8 65 3.24 4225 117
1.8 65 3.24 4225 117
1.81 64.5 3.2761 4160.25 116.745
1.81 62 3.2761 3844 112.22
1.82 71 3.3124 5041 129.22
1.83 95 3.3489 9025 173.85
1.83 68 3.3489 4624 124.44
1.83 66 3.3489 4356 120.78
1.84 62 3.3856 3844 114.08
1.86 80.5 3.4596 6480.25 149.73
1.86 73 3.4596 5329 135.78
1.88 81 3.5344 6561 152.28
1.88 66 3.5344 4356 124.08
1.88 61 3.5344 3721 114.68
1.9 77.5 3.61 6006.25 147.25
1.9 67 3.61 4489 127.3
1.9 61 3.61 3721 115.9
1.95 65 3.8025 4225 126.75
Σx= 254.695
Σy= 8684
Σx2= 423.70938
Σy2= 513718
Σxy= 14559.115
65
N= 154
Pearsons = 0.807051782
HGS Female Pearson calculation:

x y
Height (m) Weight (kg) x2 y2 xy
1.62 44.5 2.6244 1980.25 72.09
1.535 54 2.356225 2916 82.89
1.57 51 2.4649 2601 80.07
1.48 44.5 2.1904 1980.25 65.86
1.53 38 2.3409 1444 58.14
1.41 37.5 1.9881 1406.25 52.875
1.68 56 2.8224 3136 94.08
1.54 50 2.3716 2500 77
1.38 27 1.9044 729 37.26
1.6 47 2.56 2209 75.2
1.53 48 2.3409 2304 73.44
1.51 42 2.2801 1764 63.42
1.51 42 2.2801 1764 63.42
1.52 39 2.3104 1521 59.28
1.4 34 1.96 1156 47.6
1.52 43 2.3104 1849 65.36
1.52 42 2.3104 1764 63.84
1.44 37 2.0736 1369 53.28
1.54 61 2.3716 3721 93.94
1.5 41 2.25 1681 61.5
1.6 37 2.56 1369 59.2
1.52 49 2.3104 2401 74.48
1.65 44.5 2.7225 1980.25 73.425
1.535 42.5 2.356225 1806.25 65.2375
1.65 69 2.7225 4761 113.85
1.67 61.5 2.7889 3782.25 102.705
1.74 65 3.0276 4225 113.1
1.67 67 2.7889 4489 111.89
1.6 45 2.56 2025 72
1.64 46 2.6896 2116 75.44
1.58 33 2.4964 1089 52.14
1.54 43 2.3716 1849 66.22
1.62 42 2.6244 1764 68.04
1.59 48 2.5281 2304 76.32
1.65 49.5 2.7225 2450.25 81.675
1.58 50 2.4964 2500 79
1.59 56.5 2.5281 3192.25 89.835
1.55 49.5 2.4025 2450.25 76.725
1.69 57 2.8561 3249 96.33
1.66 61.5 2.7556 3782.25 102.09
1.61 51 2.5921 2601 82.11
1.625 75.5 2.640625 5700.25 122.6875
1.59 41 2.5281 1681 65.19
1.47 43 2.1609 1849 63.21
1.65 46 2.7225 2116 75.9
1.55 43.5 2.4025 1892.25 67.425
1.53 42 2.3409 1764 64.26
66
1.65 56.5 2.7225 3192.25 93.225
1.54 41.5 2.3716 1722.25 63.91
1.67 66.5 2.7889 4422.25 111.055
1.6 48 2.56 2304 76.8
1.74 64.5 3.0276 4160.25 112.23
1.6 46 2.56 2116 73.6
1.6 43 2.56 1849 68.8
1.58 50 2.4964 2500 79
1.59 55 2.5281 3025 87.45
1.62 57.5 2.6244 3306.25 93.15
1.63 60.1 2.6569 3612.01 97.963
1.62 44.5 2.6244 1980.25 72.09
1.63 50 2.6569 2500 81.5
1.64 46 2.6896 2116 75.44
1.6 52 2.56 2704 83.2
1.6 48.5 2.56 2352.25 77.6
1.7 57.5 2.89 3306.25 97.75
1.65 67.5 2.7225 4556.25 111.375
1.57 38.5 2.4649 1482.25 60.445
1.57 48 2.4649 2304 75.36
1.66 67.5 2.7556 4556.25 112.05
1.55 47 2.4025 2209 72.85
1.65 60 2.7225 3600 99
1.71 59.5 2.9241 3540.25 101.745
1.6 51.5 2.56 2652.25 82.4
1.65 65 2.7225 4225 107.25
1.56 42 2.4336 1764 65.52
1.56 41 2.4336 1681 63.96
1.64 61 2.6896 3721 100.04
1.6 47 2.56 2209 75.2
1.55 46 2.4025 2116 71.3
1.61 56.5 2.5921 3192.25 90.965
1.55 46 2.4025 2116 71.3
1.67 57 2.7889 3249 95.19
1.68 52.5 2.8224 2756.25 88.2
1.65 60 2.7225 3600 99
1.7 69 2.89 4761 117.3
1.69 66 2.8561 4356 111.54
1.59 58 2.5281 3364 92.22
1.65 56 2.7225 3136 92.4
1.79 57 3.2041 3249 102.03
1.67 46 2.7889 2116 76.82
1.615 48.5 2.608225 2352.25 78.3275
1.83 79 3.3489 6241 144.57
1.62 50 2.6244 2500 81
1.62 54 2.6244 2916 87.48
1.46 56 2.1316 3136 81.76
1.51 52 2.2801 2704 78.52
1.55 55 2.4025 3025 85.25
1.53 47 2.3409 2209 71.91
1.63 48.5 2.6569 2352.25 79.055
1.62 57.5 2.6244 3306.25 93.15
1.63 46 2.6569 2116 74.98
1.65 70.14 2.7225 4919.62 115.731
1.59 47.5 2.5281 2256.25 75.525
1.55 50.5 2.4025 2550.25 78.275
1.6 50.5 2.56 2550.25 80.8
67
1.55 45 2.4025 2025 69.75
1.55 51 2.4025 2601 79.05
1.65 62 2.7225 3844 102.3
1.74 55 3.0276 3025 95.7
1.69 47.5 2.8561 2256.25 80.275
1.69 50 2.8561 2500 84.5
1.67 58 2.7889 3364 96.86
1.64 61 2.6896 3721 100.04
1.64 55.5 2.6896 3080.25 91.02
1.7 73.5 2.89 5402.25 124.95
1.65 48 2.7225 2304 79.2
1.69 60 2.8561 3600 101.4
1.59 57 2.5281 3249 90.63
1.64 53.5 2.6896 2862.25 87.74
1.65 50 2.7225 2500 82.5
1.69 60 2.8561 3600 101.4
1.57 65 2.4649 4225 102.05
1.63 59 2.6569 3481 96.17
1.57 44.5 2.4649 1980.25 69.865
1.72 87 2.9584 7569 149.64
1.71 60.5 2.9241 3660.25 103.455
1.66 54 2.7556 2916 89.64
1.58 56 2.4964 3136 88.48
1.67 75 2.7889 5625 125.25
1.56 60 2.4336 3600 93.6
Σx= 207.17
Σy= 6749.74
Σx2= 333.4173
Σy2= 365923.38
Σxy= 10903.052
N= 129
Pearsons = 0.664555
68

An Investigation in To The Relationship Between Height, Weight, BMI and Obesity in Children

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

An Investigation in To The Relationship Between Height, Weight, BMI and Obesity in Children

Uploaded by

Copyright:

Available Formats

An investigation in to the relationship between

height, weight, BMI and obesity in children

Hypothesis 2: The BMI will be normally distributed

Author: Joseph Cryan

Course: GCSE Statistics

Date: 9 April 2009

Hypothesis 2: The BMI will be normally distributed

1 Aims, Design and Strategy..........................................................................................1

Boys (11-15) Girls (11-15)

Table 1 – Children’s (11-15) overweight and obesity prevalence 2004

Figure 2 – Female BMI 5-18 years

Hypothesis 2: The BMI will be normally distributed

1.2 Design and strategy

1.2.1 Collecting data

1.2.3 Hypothesis 2: Normal distributions

1.2.4 Hypothesis 3: Correlation

2.2 Questionnaire design

GCSE Statistics Data Collection

This data collectionFigure

Please circle: Male Female

Figure 4 – Questionnaire used to collect the height and weight data

15 year old Males 15 year old Females

Table 2 – National survey statistics 2007

Table 4 – Comparison of secondary and primary preliminary enquiry data

Female Male Total

Table 5 – The whole population

The stratified sample is shown in Table 6.

Female Male Total

Table 6 – Stratified sample

3.1 Generating random samples

Table 7 – Generating random samples

Table 8 – The sampled subset for Year 7 Females

Year 7 Year 8 Year 9 Year 10 Year 11

Table 9 – Female Heights

Year 7 Year 8 Year 9 Year 10 Year 11

Table 10 – Male Heights

Year 7 Year 8 Year 9 Year 10 Year 11

Table 11 – Female Weights

Table 12 – Male Weights

This left a clean set of data that can be worked with.

4.1 Justification for approach

The hypothesis is:

4.2 Testing the hypothesis

Upper quartile = 22.7

Table 13 – Year 11 Female BMI

Figure 6 – Box plots using UK secondary data

The relative spread is given by

HGS data National data

Table 14 – Comparing relative spread using primary and secondary data

5.1 Justification for approach

The second hypothesis is:

Hypothesis 2: The BMI for HGS students will be normally distributed

The normal distribution is given by

Figure 7 – The normal distribution

Figure 8 – Area under normal distribution between ±σ

Figure 9 – Area under normal distribution between ±2σ

Sample Yr 11 F BMI Sample Yr 11 F BMI

Table 15 – Raw data for Year 11 Female BMI

Table 16 – Frequency Table for Year 11 Female BMI

Table 17 – Data for determining mean and standard deviation

Shaded area = (20-(21.33-3.06))x7 + 2x7 + 2x5 + (21.33+3.06-24)x2 = 36.9

Total area = 2x3 + 2x7 +2x7 + 2x5 +2x2 +2x2 + 1x2 = 54