Professional Documents
Culture Documents
Investigation
Math Studies
Working Title:
1
Candidate name: Felix Dyrek Candidate number: 001528-031
Introduction:
The aim of my investigation is to find out if there is a correlation between lung cancer incidents
in total and tobacco consumption of men and women in 6 countries.
Hypothesis:
My hypothesis assumes that the rate of lung cancer incidents is bigger in the countries with
higher tobacco consumption then in the countries with smaller tobacco consumption.
Method:
In order to be able to investigate the correlation between lung cancer incidents and tobacco
consumption I needed to collect data from the tobacco industry and various (lung) cancer
constitutions. The next step is to verify the collected data and form statistics. In order to calculate
the correlation the following mathematical methods are used:
The Pearson’s Correlation Coefficient
The X2 Test
2
Candidate name: Felix Dyrek Candidate number: 001528-031
Raw Data:
100
90
80
70
60
Total
50
Female
40
Male
30
20
10
0
China Japan Thailand Sweden Poland UK
3
Candidate name: Felix Dyrek Candidate number: 001528-031
Adult Smokers
China
Japan
UK
Sweden
Poland
Thailand
4
Candidate name: Felix Dyrek Candidate number: 001528-031
Chart 3: Comparison between the average amounts of cigarettes smoked per person / year and
countries in %
China
Japan
UK
Sweden
Poland
Thailand
5
Candidate name: Felix Dyrek Candidate number: 001528-031
Calculations
The Pearson Correlation coefficient is used to identify if there is a correlation between the lung
cancer incidents and the average amount of smoked cigarettes per person. Table 4 is divided into
6 countries in order to be able to compare them. First of all it is to chart the researched data for
lung cancer incidents (x) and the amount of smoked cigarettes per person / year (y). The next
step is to multiply these data in order to obtain XY. Data in x and y have to be raised by 2 to
obtain the results for the last two columns. The following step is to sum up the obtained data up.
Table 4
Table X Y XY X2 Y2
4
Country Lung Cancer Amount of
Incidents smoked cigarettes
per person / year
1 China 93 179100000 16656300000 8649 32076810000000000
2 Japan 60 302300000 18138000000 3600 91385290000000000
3 UK 73 223200000 16293600000 5329 49818240000000000
4 Sweden 35 120200000 4207000000 1225 14448040000000000
5 Poland 85 206100000 17518500000 7225 42477210000000000
6 Thailand 87 106700000 9282900000 7569 11384890000000000
Total 6 433 1137600000 82096300000 33597 241590480000000000
6
Candidate name: Felix Dyrek Candidate number: 001528-031
The last step is to insert the collected data into the Pearson Correlation Coefficient formula and
solve the equation.
r= 82096300000−6×72.16×189600000_______________
√33597−6×72.162 √241590480000000000−6×1896000002
r= 82089216000____________
√2354.60√25901520000000000
r= -1.54x1011
There is no correlation.
7
Candidate name: Felix Dyrek Candidate number: 001528-031
The X2 Test
( fo fe )2
calc
2
fe
Where:
f o is an observed frequency
f e is an expected frequency
Observed Value Table (fo): Taken from Table 1, average of the European and Asian countries
within the female and male lung cancer incidents.
Calculation Table: The calculation table will be used to change the observed values into expected
values to have the possibility to calculate the x2 test.
S1 S2 Sum
R1 wy÷n wz÷n w
R2 xy÷n xz÷n x
Sum Y Z n
8
Candidate name: Felix Dyrek Candidate number: 001528-031
Expected Value Table (fe): This table represents the lung cancer incidents between Europe and
Asia. The data is based on my previous data on the 6 countries – China, Japan, Thailand,
Sweden, Poland and the United Kingdom divided into their representing continents. It is also
divided between male and female groups.
Now I am going to calculate the x2 test in order to observe if there exists a correlation between
observed and expected values extracted from the tables concerning male and female lung cancer
incidents within Europe and Asia.
2 Calculations:
So, 2 = 0.1565
The 2 is small enough to observe that there is a correlation between observed and expected
9
Candidate name: Felix Dyrek Candidate number: 001528-031
Degrees of freedom
df = (r – 1)(c – 1)
The next step is find df and using a table to find the meaning of x2 which I just have obtained.
The x2 distribution depends on the number of degrees of freedom (df) where df = (r – 1)(c – 1)
My table equals:
df=(r-1)(c-1)
df=(2-1)(2-1)
df=1x1=1
10
Candidate name: Felix Dyrek Candidate number: 001528-031
Due to the results which I have obtained during my research it can be concluded that there
doesn’t exist a direct correlation between the amount of smoked cigarettes and the lung cancer
incidents. So my hypothesis is proven to be wrong. There can be various factors resulting in lung
cancer such as second hand smoke, car exhaust, multiple alpha, beta and gamma rays. As these
facots can oncrease the chance of lung cancer my data is not 100% accurate as there are external
factors which can increase the lung cancer incidents. Thus lung cancer incidents are not purely
based on the amount of consuming cigarettes even though it is a known fact that excessive
cigarette consumption may cause lung cancer. As Due to the explanation above the investigation
could be improved by including more external factors such as the one previously mentioned.
11