Professional Documents
Culture Documents
QF2145 Statistics Ⅱ
Final Project. Comparison between 3C Popularity Rate and Some Indices of Countries
105061224 張傳佳
A. Introduction:
In 21st century, 3C products are what necessary objects we need to use every day in a modern
society. However, for some countries in the Third World, the popularity of 3C is still at a relatively
lower level and rises continually in recent years. The reason for the differences may be due to the
economy, education, and age distribution, etc. This report is aimed on the phenomenon and the
analyses are done on the relevance between 3C popularity rate and some indices of countries at
different degree of economic development.
B. Approach:
To do the analysis on 3C popularity rate, I searched for the statistical data in 2015 on Pew
Research Center [1]. The data is the rate of “smart phone ownership and Internet usage” including 4
kinds of rates corresponding to total, age, education, and income. According to these four categories,
I searched for other five country indices also in 2015. Human development index (HDI) [2] is for
“total”. Age distribution [3] is for “age”. Education index [4] is for “education”. GDP [5] and GNP
[6] are for “income”.
First, all collected data is gone through Goodness of Fit Test and Variance Test to know the
basic characteristics of the data. Then, Regression Analysis is used to check if age, education, and
economy really have influences on the 3C products popularity rates. Finally, with the analysis results,
the conclusion is shown as following.
C. Raw Data:
The following is the raw data I collected from the three authoritative institutes. It consists of the
popularity rate of 3C products and five country indices for thirty countries in five continents.
(a) Smart Phone Ownership and Internet Usage:
1
QF2145 Statistics Ⅱ Final Project. Comparison between 3C Popularity Rate and Some Indices of Countries
Senegal 31 40 20 21 82 18 42
Nigeria 39 52 21 9 53 27 52
Philippines 40 58 23 15 57 26 52
Kenya 40 53 22 19 70 26 52
South Africa 42 52 33 24 61 22 57
Vietnam 50 81 25 32 79 42 70
Peru 52 76 37 16 74 23 63
Mexico 54 76 38 35 87 44 66
Ukraine 60 93 44 20 62 44 73
Brazil 60 82 44 39 86 42 76
China 65 93 49 48 91 56 80
Lebanon 66 89 50 34 90 41 92
Jordan 67 75 57 41 96 50 80
Malaysia 68 91 50 29 82 46 79
Poland 69 98 56 28 78 56 81
Japan 69 97 64 56 88 51 86
Argentina 71 92 58 61 94 47 76
Italy 72 100 65 68 95 56 87
France 75 98 66 65 95 61 87
Chile 78 96 65 26 87 62 90
Germany 85 99 80 74 92 73 95
Israel 86 96 80 80 93 78 94
Spain 87 100 82 81 97 80 95
UK 88 98 85 82 98 82 98
USA 89 99 85 80 95 84 97
Canada 90 100 87 81 95 85 99
Australia 93 100 90 87 98 84 99
Country Indices
Education GDP per GNP per
Country HDI Population
index capital capital
0-14 15-59 60+ 80+
Ethiopia 0.451 0.322 645.47 600 41.4 53.3 5.2 0.5
Pakistan 0.551 0.398 1428.64 1430 35 58.4 6.6 0.6
Burkina Faso 0.412 0.277 575.31 630 45.6 50.6 3.8 0.2
India 0.627 0.542 1606.95 1600 28.8 62.3 8.9 0.9
Ghana 0.585 0.556 1783.06 1960 38.8 55.9 5.3 0.4
2
QF2145 Statistics Ⅱ Final Project. Comparison between 3C Popularity Rate and Some Indices of Countries
In detail, I divided the data into 6 parts, so each part is assumed has 5 (30 ÷ 6 = 5) countries which
is represented as 𝒆𝒊 . As for the practical calculated results, they are represented as 𝒇𝒊 . The table
3
QF2145 Statistics Ⅱ Final Project. Comparison between 3C Popularity Rate and Some Indices of Countries
2
below shows the calculated value of each data gone through chi-square test and 𝜒0.05 ≅ 42.5570.
Though some values are kindly higher, we can see that all values are smaller than 42.5570. As a
consequence, it can be concluded that all the data obey normal distribution and all the following
tests can be based on this conclusion.
4
QF2145 Statistics Ⅱ Final Project. Comparison between 3C Popularity Rate and Some Indices of Countries
𝑯 : 𝜎 2 ≤ 20%
{ 𝟎 2 .
𝑯𝟏 : 𝜎 > 20%
From the table below, we can find that the chi-square value of each data is large. Though there are
2
four values lower than 𝜒0.05 ≅ 42.5570 and they can’t reject 𝑯𝟎 , the variances of the population
for the total and three categories (age, education, and income) are at a high level. Then, it can be
concluded that an obvious gap exists between different countries for the popularity of 3C products.
Variance Test
Age Education Income
Less More Lower Higher
TOTAL 18-34 35+
education education income income
% % % % % % %
Average 58.533 75.433 47.600 39.700 78.700 44.767 70.167
Variance 23.263 26.499 25.351 26.707 18.404 23.874 23.609
Exp. Std. 20 20 20 20 20 20 20
Chi-Square 39.234 50.908 46.593 51.711 24.556 41.323 40.410
Conclusion H0 H1 H1 H1 H0 H0 H0
5
QF2145 Statistics Ⅱ Final Project. Comparison between 3C Popularity Rate and Some Indices of Countries
Total (Y)
R square 0.848586923
50
Adjusted R Square 0.843179313
Standard Error 9.212173778
Observations 30 0
0 0.5 1
HDI
ANOVA
df SS MS F Significance F
Regression 1 13317.27059 13317.27059 156.9245819 5.36877E-13
Residual 28 2376.19608 84.86414572
Total 29 15693.46667
R square 0.806307356
50
Adjusted R Square 0.799389762
Standard Error 10.41926281
Observations 30 0
0 0.5 1
Education Index
ANOVA
df SS MS F Significance F
Regression 1 12653.75762 12653.75762 116.5589231 1.72744E-11
Residual 28 3039.709049 108.5610375
Total 29 15693.46667
6
QF2145 Statistics Ⅱ Final Project. Comparison between 3C Popularity Rate and Some Indices of Countries
Total
R square 0.631549464 50
Adjusted R Square 0.618390516
Standard Error 14.37043866 Y Predicted Y
Observations 30 0
0 20000 40000 60000
GDP per Capital
ANOVA
df SS MS F Significance F
Regression 1 9911.200463 9911.200463 47.99391851 1.56854E-07
Residual 28 5782.266203 206.5095073
Total 29 15693.46667
R square 0.62416729 60
Adjusted R Square 0.610744693 40
Standard Error 14.5136858 20 Y Predicted Y
Observations 30 0
0 20000 40000 60000
GNP per Capital
7
QF2145 Statistics Ⅱ Final Project. Comparison between 3C Popularity Rate and Some Indices of Countries
ANOVA
df SS MS F Significance F
Regression 1 9795.348556 9795.348556 46.50123216 2.08168E-07
Residual 28 5898.118111 210.6470754
Total 29 15693.46667
Then, for the second class, age 15~59, the 𝑨𝒅𝒋. 𝑹𝟐 is almost equal to 0. It means that this level
of age has no influence on the popularity rate of 3C products. For the remaining two classes, age 60+
and age 80+, the slopes of the linear regression lines are positive. The reason may be the same as that
of age 0~14. The population of a country is younger tends to have a lower HDI. In contrast, if the
aged population of a country takes a high proportion, its HDI is higher than others. As a result, the
two classes have positive trend for popularity rate of 3C products. Overall, though the 𝑨𝒅𝒋. 𝑹𝟐 of
the four classes are not significant especially for age 15~59, there are still many characteristics worth
discussing for some phenomena of usage of 3C products in each country.
8
QF2145 Statistics Ⅱ Final Project. Comparison between 3C Popularity Rate and Some Indices of Countries
Total
Total
50 50
𝑨𝒅𝒋. 𝑹𝟐 ≅ 𝟎. 𝟓𝟓𝟔𝟓 𝑨𝒅𝒋. 𝑹𝟐 ≅ 𝟎. 𝟎𝟒𝟐𝟖
Y Predicted Y Y Predicted Y
0 0
0 20 40 60 0 50 100
Age 0~14 Age 15~59
Total
50 50
𝟐
𝑨𝒅𝒋. 𝑹 ≅ 𝟎. 𝟓𝟑𝟒𝟏 𝑨𝒅𝒋. 𝑹𝟐 ≅ 𝟎. 𝟒𝟖𝟓𝟑
Y Predicted Y Y Predicted Y
0 0
0 20 40 0 5 10
Age 60+ Age 80+
After analyzing, the Significance F of ANOVA and P-value of the regression line
coefficients are shown to be significant. Then, the following just shows the regression plots and the
𝑨𝒅𝒋. 𝑹𝟐 . From the plots, we can find that the education index and GNP have impacts on less
education and lower income rates largely, i.e., people in these two categories are easily influenced
by the development of a country. With this point, we could do a deeper research to find the reasons
which cause the phenomenon. Then, the government could aim on the usage of 3C products and make
some new national policies to increase the education index and GNP of the country and make the
country more competitive in the World.
9
QF2145 Statistics Ⅱ Final Project. Comparison between 3C Popularity Rate and Some Indices of Countries
Less Education vs. Edu. Index High Education vs. Edu. Index
80 100
High Education
Less Education
Y Predicted Y
𝑨𝒅𝒋. 𝑹𝟐 ≅ 𝟎. 𝟓𝟗𝟒𝟏
50
30
𝑨𝒅𝒋. 𝑹𝟐 ≅ 𝟎. 𝟑𝟕𝟎𝟔
Y Predicted Y
0
-20 0 0.5 1 0 0.5 1
Edu. Index Edu. Index
Lower Income vs. GNP per Capital Higher Income vs. GNP per Capital
100 100
Higher Income
Lower Income
50 50
𝟐
𝑨𝒅𝒋. 𝑹 ≅ 𝟎. 𝟕𝟎𝟐𝟑 𝑨𝒅𝒋. 𝑹𝟐 ≅ 𝟎. 𝟓𝟑𝟏𝟗
Y Predicted Y Y Predicted Y
0 0
0 50000 100000 0 50000 100000
GNP per Capital GNP per Capital
The results below including three plots with only one independent variable changing and fix the
other two. The multiple regression function is
From the table, we can see that 𝑺𝒊𝒈𝒏𝒊𝒇𝒊𝒄𝒂𝒏𝒄𝒆 𝑭 ≅ 𝟑. 𝟐𝟏𝐄 − 𝟏𝟎 is very small and the 𝑨𝒅𝒋. 𝑹𝟐 ≅
𝟎. 𝟖𝟏𝟑 is close to 𝟎. 𝟖𝟒𝟑 of HDI with simple linear regression. However, the table also shows that
P-value of the three coefficients are higher than 𝛼 = 0.05 especially for the variable 𝑋2 , age
10
QF2145 Statistics Ⅱ Final Project. Comparison between 3C Popularity Rate and Some Indices of Countries
distribution. It is kindly a strange result that F-test is significant while the coefficients are not. To find the
reason, I went back to check if the three “independent” variables are really independent to each other. The
results of three 𝑨𝒅𝒋. 𝑹𝟐 are shown below.
Variables 𝑨𝒅𝒋. 𝑹𝟐
HDI & Education Index 0.604901
HDI & Age Distribution 0.570967
Education & Age Distribution 0.584832
From the three values, we can find that the three variables are not truly independent. Then, it results
in the lower contribution of the three variables to the multiple regression line. As a result, the P-
value of them are not significant. It is a critical point of my multiple regression model and it needs
to be corrected for the future works.
50 50
Y Predicted Y
0 0
0 0.5 1 0 20 40 60
Edu. Index Age Distribution
ANOVA
df SS MS F Significance F
Regression 3 13060.40175 4353.46725 42.98798245 3.21005E-10
Residual 26 2633.064918 101.2717276
Total 29 15693.46667
11
QF2145 Statistics Ⅱ Final Project. Comparison between 3C Popularity Rate and Some Indices of Countries
The following is the comparison between the results of the initial and the changed form.
Observe the calculated values, the 𝑨𝒅𝒋. 𝑹𝟐 ≅ 𝟎. 𝟕𝟕 of the changed logarithmic form is higher than
𝟎. 𝟓𝟑 of the initial linear form. As for the Significance F, the value of the changed one is
9.36527E-11 much lower than that of the initial one 2.91515E-06. With the better 𝑨𝒅𝒋. 𝑹𝟐 and
Significance F, it can be said that the relevance between Higher Income and GNP per Capital is
closer to a logarithmic form not the initial assumption linear form.
Higher Income vs. GNP (linear) Higher Income vs. GNP (log)
100 150
Y Predicted Y
Higher Income
Higher Income
100
50
50
Y Predicted Y
0 0
0 50000 100000 0 2 4 6
GNP per Capital GNP per Capital
12
QF2145 Statistics Ⅱ Final Project. Comparison between 3C Popularity Rate and Some Indices of Countries
E. Conclusion:
With a series of data analysis, we can roughly make a sense of the relevance between 3C products
popularity rate and some country indices.
First, all tested data are gone through Goodness of Fit Test, and it is checked that each data
obeys normal distribution. Second, the Variance Test is done to see the variance of each data and the
results shows that some gaps exist between different countries with the popularity rate.
Third, the total popularity rate is under Simple Linear Regression Analysis with HDI,
education index, GDP, GNP, and age distribution. Almost all the regression lines after analyzing
has positive trends except for the special case with age distribution at age 0~14. The reason for this
special case has been discussed at that part. Forth, the sub-items, less/high education and
lower/higher income, for the popularity rate is analyzed with education index and GNP per capital.
Then, we found that the condition of a country has a larger impact on less education and lower
income categories.
Fifth, the Multiple Regression Analysis is done aiming on total rate with education index,
GNP, and age distribution. There is a strange result after analyzing and the possible reason is also
discussed at that part. The last but not the least, the higher income popularity rate and GNP per
capital are taken as an example to see if the changed logarithmic form has a better 𝑨𝒅𝒋. 𝑹𝟐 and
Significance F. The result shows the success of this changing.
For the whole work, the set of 3C popularity rates are viewed as dependent variables 𝒀 ̂ and
̂
other country indices are taken as independent ones X. It seems that X’s are always the causes and 𝒀
is the effect. However, the regression analysis just gives us the relevance between each data. The
determination of what the variable X is depends on ourselves. As a result, the deeper research is
needed to know the exact cause-effect relation. We can take it as a future work. Moreover, we could
also take the data in recent years to check if the constructed regression model in 2015 is really suitable
for. If it is truly suitable, the regression models could be adopted to determine some national policies
or do other things to make a country be developed and grow faster and better.
References:
[1] Jacob Poushter, “Smartphone Ownership and Internet Usage Continues to Climb in Emerging Economies”,
Pew Research Center, pp. 11, 2016
[2] UNITED NATIONS DEVELOPMENT PROGRAMME, http://hdr.undp.org/en/data
[3] “World Population Prospects”, UNITED NATIONS DESA/POPULATION DIVISION, revision, pp. 27-31,
2015
[4] UNITED NATIONS DEVELOPMENT PROGRAMME, http://hdr.undp.org/en/data
[5] THE WORLD BANK, https://data.worldbank.org/indicator/NY.GDP.PCAP.CD?end=2015&start=1960
[6] THE WORLD BANK, https://data.worldbank.org/indicator/NY.GNP.PCAP.CD?end=2015&order=wbap
i_data_value_2014+wbapi_data_value+wbapi_data_value-last&sort=desc&start=1962
13