You are on page 1of 22

Multivariate analysis: Factor and Cluster Analysis

31TH 2016
Faculty of Economics and Business, University of Zagreb
Multivariate analysis: Factor and Cluster Analysis

1. Short introduction to analysis: Small open economy

As a team, we decided that we want to make a brief analysis of business competitiveness of an economy.
It’s already recognised worldwide, and measured with different methodologies trough the long period
of time. World competitiveness global report has been accepted as effective, realistic and useful tool for
country SWOT analysis based on 12 pillars. It has been positioned as display of country current position
and became a start line for building of an efficient and substantial economic development strategy
among large amount of included countries. It defines competitiveness as the set of institutions, policies,
and factors that determine the level of productivity of a country. A more competitive economy, logically,
is one that is likely to grow faster over time.

Many determinants drive productivity and competitiveness. Understanding the factors behind this
process has occupied the minds of economists for hundreds of years, engendering theories ranging from
Adam Smith’s focus on specialization and the division of labour to neoclassical economists’ emphasis
on investment in physical capital and infrastructure, and, more recently, to interest in other mechanisms
such as education and training, technological progress, macroeconomic stability, good governance, firm
sophistication, and market efficiency, among others. While all of these factors are likely to be important
for competitiveness and growth, they are not mutually exclusive—two or more of them can be
significant at the same time, and in fact that is what has been shown in the economic literature.

Our analysis takes into consideration small open economies in Europe. One of the defining features of
small open economies is that households and firms in these countries can borrow and lend at an interest
rate determined by international markets. But not all small open economies are alike. Although small
open economies share the feature of being price-takers in international bond markets — that is, they do
not influence prices in the marketplace — they differ substantially in other dimensions. Consequently,
economists sort these countries into two types: developed (or industrialized) economies and developing
(or emerging) economies. This classification was originally proposed in the 1980s by World Bank
economist Antoine van Agtmael. In spite of this deceptively simple classification, there is no consensus
about where the distinction between developed and developing vanishes.

To avoid these conflicting views about the definition of emerging countries, we rely on more concrete
quantitative measures based on the business cycle properties of these economies.

Since the regional component of every economy gets increasing attention these days, since the scientific
research have shown that 75% of profit arises from cities level, while rural parts of each country show
whole spectre of development progress problems, we wanted to see, which factors make the most
significant impact on results, in the end, created on macroeconomic level.

The concept of “A Smart City” defines the cites as accelerate of sustainable economic growth, and the
whole concept lies on the idea of an social and technological infrastructure that improves the quality of
life in the city for everyone. And of whole economy in the end.

So the starting point of analysis were:

a) Ninth pillar of World competitiveness report defined as Technological readiness


In today’s globalized world, technology is increasingly essential for firms to compete and
prosper. The technological readiness pillar measures the agility with which an economy adopts
existing technologies to enhance the productivity of its industries, with specific emphasis on its
capacity to fully leverage information and communication technologies (ICTs) in daily
activities and production processes for increased efficiency and enabling innovation for
competitiveness.

ICTs have evolved into the “general purpose technology” of our time,15 given their critical
spill overs to other economic sectors and their role as industry-wide enabling infrastructure.
Therefore ICT access and usage are key enablers of countries’ overall technological readiness.

Whether the technology used has or has not been developed within national borders is irrelevant
for its ability to enhance productivity. The central point is that the firms operating in the country
need to have access to advanced products and blueprints and the ability to absorb and use them.
Among the main sources of foreign technology, FDI often plays a key role, especially for
countries at a less advanced stage of technological development. It is important to note that, in
this context, the level of technology available to firms in a country needs to be distinguished
from the country’s ability to conduct blue-sky research and develop new technologies for
innovation that expand the frontiers of knowledge.

Variables took into a consideration: Firm-level technology absorption & FDI and technology
transfer

b) Eleventh pillar: Business sophistication


There is no doubt that sophisticated business practices are conducive to higher efficiency in the
production of goods and services. Business sophistication concerns two elements that are
intricately linked: the quality of a country’s overall business networks and the quality of
individual firms’ operations and strategies. These factors are especially important for countries
at an advanced stage of development when, to a large extent, the more basic sources of
productivity improvements have been exhausted. The quality of a country’s business networks
and supporting industries, as measured by the quantity and quality of local suppliers and the
extent of their interaction, is important for a variety of reasons. When companies and suppliers
from a particular sector are interconnected in geographically proximate groups, called clusters,
efficiency is heightened, greater opportunities for innovation in processes and products are
created, and barriers to entry for new firms are reduced. Individual firms’ advanced operations
and strategies (branding, marketing, distribution, advanced production processes, and the
production of unique and sophisticated products) spill over into the economy and lead to
sophisticated and modern business processes across the country’s business sectors.

Variables took into a consideration: Nature of competitive advantage & Production process
sophistication

c) Twelfth pillar: Innovation


Innovation can emerge from new technological and non-technological knowledge. Non-
technological innovations are closely related to the know-how, skills, and working conditions
that are embedded in organizations and are therefore largely covered by the eleventh pillar of
the GCI. The final pillar of competitiveness focuses on technological innovation. Although
substantial gains can be obtained by improving institutions, building infrastructure, reducing
macroeconomic instability, or improving human capital, all these factors eventually run into
diminishing returns. The same is true for the efficiency of the labour, financial, and goods
markets. In the long run, standards of living can be largely enhanced by technological
innovation. Technological breakthroughs have been at the basis of many of the productivity
gains that our economies have historically experienced. These range from the industrial
revolution in the 18th century and the invention of the steam engine and the generation of
electricity to the more recent digital revolution. The latter is not only transforming the way
things are being done, but also opening a wider range of new possibilities in terms of products
and services. Innovation is particularly important for economies as they approach the frontiers
of knowledge, and the possibility of generating more value by merely integrating and adapting
exogenous technologies tends to disappear.

Variables took into a consideration: Capacity for innovation, Quality of scientific research
institutions, Company spending on R&D, University-industry collaboration in R&D, Gov’t
procurement of advanced tech products, Availability of scientists and engineers, PCT patents,
applications/million pop.*
They have the largest set with a reason and that’s the fact that modern economy is driven with
fast changing, global environment that requires productive, fast solutions and proactive role of
government and business policies to create a “safe”, substantial economy growth. Cities need
to see a future not just adopt to newly created situation (with short term duration).

1. Variable that connects them, of course is human capital, so we took the measure made
for quality of the education system and quality of math and science education.

So final data set looks like it’s shown:


2. Factor analysis – Business Competitiveness – “Step by step”
2.1. Starting R and data preparation

> setwd("C:\\Users\\Public\\Documents")
> comp=read.table(file="BusinessCompetitiveness.txt")
> comp

2.2. Editing matrix comp

> colnames(comp)=c("QES","QMSE","FLTA","FDI","NCA","PPS","CI","QSRI","CSR&D","UICR&D","GPA
TP","ASE","PCTP")
> rownames(comp)=c("Austria","Belgium","Croatia","Cyprus","Estonia","Iceland","Latvia","Lithuania","Malta
","Montenegro","Portugal","SlovakRepublic","Slovenia","Switzerland")
> comp

The number of rows and columns in our matrix were defined using following functions:
> n=nrow(comp)
>n
>p=ncol(comp)
>p
2.3. Standardization of data
The third step is referred to the matrix of standardized values, required for further analysis. In
this part, function scale was used because its default method centers and/or scales the columns
of a numeric matrix. Standardized values of observed variables are given in the matrix named
xs.

> xs=scale(comp)

> xs

2.4. Covariance matrix


We applied the cov function to compute the covariance between related standardized variables
which is equal to correlation between original variables. (Cov (xs) = cor(x)).

> r=cov(xs)
>r
2.5. Defining the number of factors used in analysis

Computing eigenvalues and eigenvectors:

Function eigen is used for computing eigenvalues and eigenvectors of correlation matrix r.

> e=eigen(r)
> values=e$values
> vectors=e$vectors

Eigenvalues are important for selection of the number of factors which are going to be used in
the factor analysis. The K1 method for identification of the number of factors proposed by
Kaiser is perhaps the best known and most utilized in practice. According to this rule, only the
factors that have eigenvalues greater than one are retained for interpretation. In this case 2
factors have eigenvalues greater than one.

Eigenvalues
Eigenvectors

Percentage of variance explained:

From the next table it is visible that only the first 2 eigenvalues are greater than one and together
explain the majority of the variation in the original data (74%).

> perc=values/sum(values)
> cumperc=cumsum(perc)
> table=cbind(values,perc,cumperc)
> table
Cattell’s Scree test

The alternative method of selecting the number of factors to extract is through the use of a
Cattell’s “scree” plot, which is a graph of the eigenvalues (variances) of each component plotted
against the component number. From this plot, it may be seen that the line “drops-off” after
about the second eigenvalue, so we have retained two components which together explain about
74 % of the variation in the original data.

plot(values, xlab="Number", ylab="Eigenvalues", main="Scree plot", lwd=2, cex=2, col="deeppink3")

2.6. Factor loadings matrix

> z=xs%*%vectors
> r1=cor(cbind(z,xs))
> compet2=r1[14:26,1:2]
> compet2

Factor loadings unrotated


Factor loadings are given in the table above. From the coefficients it is visible that the structure
of the matrix is not simple to interpret. Specifically, the variable GPATP is significant in both
factors. For that reason we will perform a varimax rotation which produces uncorrelated factors.
By rotating the factors we attempt to find a factor solution that is equal to that obtained in the
initial extraction but which has the simplest interpretation. For varimax a simple solution means
that each factor has a small number of large loadings and a large number of zero (or small)
loadings. This simplifies the interpretation because, after a varimax rotation, each original
variable tends to be associated with one (or a small number) of factors, and each factor
represents only a small number of variables.

Percentage of variance explained by common factors

The percentage of variance of each variable explained by common factors is given in the
following table. It is visible that the majority of variance variables is explained by extracted
factors.

> common=diag(compet2%*%t(compet2))
> common

Specific variance

The following table shows the remaining, unexplained part of variance of each variable.
> specific=diag(r)-common
> specific
Factor loadings rotated

The following table shows factor loadings matrix after varimax rotation . It is more clear to
interpret the values because each variable is significant only for one factor. 11 variables are
significant for the first factor (QES, QMSE, FLTA, NCA, PPS, CI, QSRI, CSR&D, UICR&D,
ASE, PCTP) and two variables for the second factor (FDI, GPATP). Therefore we can easily
separate variables among the 2 factors. The first factor explaines 59.9% ov the variation in the
data and second factor explaines 14.5% of the variation.

> rot=varimax(compet2)
> rotloadings=rot$loadings
> rotloadings
2.7. Factor scores before rotation
The score for a given factor is a linear combination of all of the measures, weighted by the
corresponding factor loading.

> fa=xs%*%solve(r)%*%compet2
> fa

> plot(fa,type="n",xlab="Factor scores 1",ylab="Factor scores 2",main="Representation of countries before rotat


ion",cex.lab=1.2,cex.axis=1.2,lwd=2)
> abline(h=0,v=0)
> text(fa,c("Austria","Belgium","Croatia","Cyprus","Estonia","Iceland","Latvia","Lithuania","Malta","Montene
gro","Portugal","SlovakRepublic","Slovenia","Switzerland"),cex=1.2)
2.8. Factor scores after rotation
> rotfa=xs%*%solve(r)%*%rotloadings
> rotfa
3. Cluster analysis

Large open economies were introduced to the analysis to cunduct a cluster analysis.

Data:

3.1. Euclidian distance


>cluster =read.table(file="cluster.txt",header=T)
>dist.eu= dist(cluster,method="euclidean",p=2,diag=T)
>dist.eu2=dist.eu^2

Figure 1. Euclidean distance matrix

>s=hclust(dist.eu2, method = "single")

>plot(s, hang = -0.1, frame.plot = TRUE, ann = FALSE)


300000
250000
200000
150000
100000
50000
0

12

15

14

18

19

10

13

20

16

17

11

9
In next step we exclude the 12th variable:

>cluster2=cluster[-12,]
>dist.eu2= dist(cluster2,method="euclidean",p=2,diag=T)

>dist.eu22=dist.eu2^2
>s=hclust(dist.eu22, method = "single")

>plot(s, hang = -0.1, frame.plot = TRUE, ann = FALSE)


14
12
10
8
6
4
2

15

14

18

19

10

13

20

16

17

11

2. Manhattan distance:
>cluster =read.table(file="cluster.txt",header=T)
>cluster

>dist.ma= dist(cluster,method="manhattan",p=2,diag=T)
>dist.ma

>s=hclust(dist.ma, method = "single")

>plot(s, hang = -0.1, frame.plot = TRUE, ann = FALSE)


1500
1000
500
0

12

15

14

18

19

10

13

20

16

17

11

9
>cluster2=cluster[-12,]
>dist.ma2= dist(cluster2,method="manhattan",p=2,diag=T)
>dist.ma2

>s=hclust(dist.ma2, method = "single")

>plot(s, hang = -0.1, frame.plot = TRUE, ann = FALSE)


14
12
10
8
6
4

15

14

18

19

10

13

20

16

17

11

9
3. Maximum distance
> cluster =read.table(file="cluster.txt",header=T)
> dist.mi= dist(cluster,method="maximum",p=2,diag=T)
> s=hclust(dist.mi, method = "single")
> plot(s, hang = -0.1, frame.plot = TRUE, ann = FALSE)
> cluster2=cluster[-12,]
> dist.mi2= dist(cluster2,method="manhattan",p=2,diag=T)
> dist.mi2

>s=hclust(dist.ma, method = "single")

>plot(s, hang = -0.1, frame.plot = TRUE, ann = FALSE)


300
250
200
150
100
50
0

12

14

10

18

19

15

13

20

16

17

11

s=hclust(dist.mi2, method = "single")

> plot(s, hang = -0.1, frame.plot = TRUE, ann = FALSE)


4 6 8 10 12 14

15

14

18

19

10

13

20

16

17

11

You might also like