You are on page 1of 47

Data Analysis

The Middle Semester Evaluation Project

The Implementation of Factor Analysis, Discriminant


Analysis, Logistic Regression, and Cluster Analysis to Scopus
and Google Scholar Data of ITS Lecturers
Elisabeth Brielin Sinua
Department of Statistics, Sepuluh Nopember Institute of Technology, Surabaya, East Java, Indonesia

Abstract
In this work I implement Factor Analysis, Discriminant Analysis, Logistic Regression and Cluster Analysis
to Scopus and Google Scholar data. In factor analysis, I use Principal Component Analysis. This method produce
different component in different data. The variables load on one component were most likely to associate with other
variables in the same component. In contrast, the variables from different component seems to be not related. For
example, age and period of work constantly load on one component in every data. In matrix plot showed that age and
period of work have positive correlation. The complete correlation among variables can be seen clearly in matrix plot
in session 3. This matrix plot construct to support the result of factor analysis. In this paper, I also briefly describe the
analysis descriptive in discriminant analysis part. The descriptive analysis provide me the information that age and
length of work didnt constantly significant to the value of h-index. In discriminant analysis part, by using stepwise
method, I provide the most useful predictors for distinguishing among groups based on response which the aim of this
case is to group the lecturers who have h-index less than 2 and the lecturers who have h-index greater than or equal to
2. To classify the response variable, the discriminant function should be determined first. The model is useful when

the cross-validated classification accuracy greater than or equal to the proportion by chance accuracy. The
model produced by using discriminant analysis in each data already is useful model. Another
classification method i described in this paper is cluster analysis. In this analysis I determine the number
of cluster. this method using nonhierarchieal method since the number of cluster is known. I identify
whether the obtained cluster based on the functional position cases which is consist of professor, assistant
professor, lector and expert assistant. Unfortunately, functional position cant be constructed into cluster.
This is because the cases of each cluster tend to spread among cluster. Consequently, none of the cluster
load on dominant cases. Otherwise, cases level of education capable to be constructed to cluster. Mostly,
the lecturer who carry doctoral degree can be separated with the lecturer who do not own doctoral degree.
The complete data analysis summarized in session 3
Keywords: Google Scholar, Scopus, h-index, Citation, PCA, Factor Analysis, Discriminant Analysis, logistic
Analysis, Cluster Analysis

Elisabeth Brielin Sinu/ 1314201001

1. Introduction
Scopus is the largest abstract and citation database of peer-reviewed literature: scientific journals, book
and conference proceedings. With over 20,500 titles from more than 5,000 international publishers,
Scopus offers researchers an accurate, easy and comprehensive tool to support their research needs in
scientific, technical, medical, social sciences, and arts and also humanities fields. Beside Scopus, there is a
Google Scholar that provide a simple way to broadly search for scholarly literature. From one place,
people can search across many disciplines and sources: articles, theses, books, abstracts and court
opinions, from academic publishers, professional societies, online repositories, universities and other web
sites. Google Scholar helps to find relevant work across the world of scholarly research. Google Scholar
aims to rank documents the way researchers do, weighing the full text of each document, where it was
published, who it was written by, as well as how often and how recently it has been cited in other
scholarly literature.
The aims of this paper is to implement the Factor Analysis, Discriminant Analysis, Logistic Regression
and Cluster Analysis to the Scopus and Google Scholar data of the lecturers in Sepuluh Nopember
Institute of Technology. There are several data that will be applied to the analysis.Those analysis above
will be executed to the ITS lecturers data that has Scopus account, Google Scholar account, and those who
have both account. Beside analyse ITS lecturers globally, I especially will analyse the lecturers data of
Mathematics and Sciences department according to Scopus Acount, Google Scholar account and those
who have both account. In session I, I will analyse the data using factor analysis so that I can obtain
factors that contain variables with simple structure. There are some requirements that strongly need to
achieve regarding to obtain simple structure variables. These requirements will be described briefly in this
paper. In addition, the Descriptive Analysis will be delivered in order to support the factor analysis result.
Beside providing the implementation of the factor analysis to the data, in session II I will apply the using
of Discriminant Analysis. As we know that the main purpose of Discriminant analysis is to classify.
Therefore, By performing this analysis in this paper, I intend to classify the ITS lecturers based on h-index
value. In the next session, logistic regression will be used to gain the significant function of logistic
regression model. So that, the logistic regression model will be compared to discriminant analysis. In
addition, I will include some independent non metric variables into logistic regression. By doing this, i
expect to extract valuable information using odds ratio. The last analysis that will be provided is cluster
analysis. In session III, I will determine the different number of cluster and identify that those clusters
contains variables that I already choose before. Each analysis requires different kind of assumption.
II. Literature Review
Scopus

Elisabeth Brielin Sinu/ 1314201001

Scopus is the largest abstract and citation database of peer-reviewed literature: scientific journals,
books and conference proceedings. Delivering a comprehensive overview of the world's research output in
the fields of science, technology, medicine, social sciences, and arts and humanities, Scopus features smart
tools to track, analyze and visualize research. As research becomes increasingly global, interdisciplinary
and collaborative, you can make sure that critical research from around the world is not missed when you
choose Scopus. Scopus has twice as many titles and over 50% more publishers listed than any other A&I
database, with interdisciplinary content that covers the research spectrum. Timely updates from thousands
of peer-reviewed journals, preliminary findings from millions of conference papers, and the thorough
analysis in an expanding collection of books ensure you have the most up-to-date and highest quality
interdisciplinary content available. Scopus is designed to serve the research information needs of
researchers, educators, administrators, students and librarians across the entire academic community.
Whether searching for specific information or browsing topics, authors, journals or books, Scopus
provides precise entry points to peer-reviewed literature in the fields of science, technology, medicine,
social sciences, and arts and humanities.

Google Scholar
Google Scholar is a freely accessible web search engine that indexes the full text or metadata
of scholarly literature across an array of publishing formats and disciplines. Released in beta in November
2004, the Google Scholar index includes most peer-reviewed online journals of Europe and America's
largest scholarly publishers, plus scholarly books and other non-peer reviewed journals.

Principal Component Analysis


Principal Component Analysis is statistics analysis that aim to reduce data dimension by
constructing the new variable (principal component) which this component is linear combination of the
original variable so that variance of this proncipal component become maximum and betwen principal
component independent.
The model of Principal component analysis can be written as the following:

[ ][

Y1
a11 a12 a13
Y2
a 21 a22 a23
Y 3 = a 31 a32 a33

Ym
am 1 a m 2 a m3

a1 p
a2 p
a3 p

amp

][ ]
X1
X2
X3

Xm

, m p

(5)

With;
Y1 = the first principal component, the component with the highest variance.
Y2 = the second principal component, the component with the second highest variance.
Ym = the mth principal component , the mth component with the highest variance.
X1 = the first original variable
X2 = the second original variable

Elisabeth Brielin Sinu/ Data Analysis - 1st Project

Xp = the pth original variable


m = the number of principal component
p = the number of original variable
Factor Analysis
The aim of factor analysis is to illustrate the covarian relation among unobserved basic variable.
Observed random vector X with

component, has mean

and covarian matrices. The

model of fator analysis can be written as the following:

X 1 1=l 11 F 1 +l 12 F2 + l 1 m F m + 1
X p p=l p 1 F 1 +l p 2 F 2+ l pm Fm + p

Or can be written as the following matrices notation :

X pxl = ( pxl )+ L( pxm) F ( mxl) + ( pxl)


With i= the mean of i-th variable
i = the spesific factor the i-th
F j= the j-th common factor
l ij = loading factor

The main purpose of factor analysis is to describe the structure of relationships among many
variables in the form factor or variable latent or variable formations. Factors that form is the amount of
random (random quantities) that previously could not be observed or measured or determined directly.
In addition to the main purpose of factor analysis, there are other objectives are:
1) The first objective to reduce the number of variables origin polynomial into a number of new
variables a smaller number of variables origin, and the new variable is called a factor or latent
variables or constructs or variable formations.
2) The second objective is to identify the relationship between variables making up factor or
dimension by factors formed, by using testing the correlation coefficient between factor with its
constituent components. Analysis of these factors is called confirmatory factor analysis.
3) The third objective is to test the validity and reliability of the instrument with confirmatory factor
analysis.
4) The fourth object one of the purposes of data validation factor analysis is to determine whether
the results of the analysis of these factors can be generalization into the population, so that after a
form factor, the researchers already have a new hypothesis based on the results of the factor
analysis.
Discriminant Analysis
Discriminant analysis is used to analyze relationships between a non-metric dependent variable and
metric or dichotomous independent variables. This analysis attempts to use the independent variables to
distinguish among the groups or categories of the dependent variable. The usefulness of a discriminant
model is based upon its accuracy rate, or ability to predict the known group memberships in the categories
of the dependent variable.
Discriminant analysis works by creating a new variable called the discriminant function score which
is used to predict to which group a case belongs. Discriminant function scores are computed similarly to
factor scores, i.e. using eigenvalues. The computations find the coefficients for the independent variables
that maximize the measure of distance between the groups defined by the dependent variable. The

Elisabeth Brielin Sinu/ 1314201001

discriminant function is similar to a regression equation in which the independent variables are multiplied
by coefficients and summed to produce a score.
In many cases, the dependent variable consists of two groups or classifications, for example, male
versus female, and high versus low. In other instances, more than two groups are involved, such as low,
medium, and high classifications. Discriminant Analysis is capable of handling either two-groups or
multiple (three or more) groups. When two classifications are involved, the technique is referred to as
two-group discriminant analysis. When three or more classifications are indentified, the technique is
referred to as multiple discriminant analysis (MDA). Logistic regression is limited in its basic form to two
groups, altough other formulations can handle more groups.
As with all multivariate techniques, discriminant analysis is based on a number of assumptions.
These assumptions relates to both the statistical processes involved in the estimation and classification
procedures and issues affecting the interpretation of the result. These assumptions of discriminant analysis
consist of Normality of Independent variables, linearity of relationship, lack of multicollinearity among
independent variables, and equal dispersion matrices.
Logistic Regression
Logistic regression, along with discriminant analysis , is the apropriate statistical technique when
the dependent variable is categorical (nominal or nonmetric) variable and the independent variables are
metric or nonmetric variables. When compared to discriminant analysis, logistic regression is limited in
its basic form to two groups for the dependent variable, although other formulations can handle more
groups. It does have the advantage, however, of easily incorporating nonmetric variables as independent
variables, much like in multiple regression.
In practical sense, logistic regression may be preferred for two reasons. First, discriminant
analysis relies on strictly meeting the assumptions of multivariate normality and equal-variance matrices
across groups-assumptions that are not met in many situations. Logistic regression does not face this strict
assumptions and is much more robust when there assumptions are not met, making its application
appropriate in many situations. Second, even if the assumptions are met, many researchers prefer logistic
regression because it is similar to multiple regression.
Sample size considerations for logistic regression are primarily focused on the size of each
group, which should have 10 times the number of estimated model coefficients. Samole size requirements
should be met in both the analysis and the hold out samples. Model significance test are made with a chisquare test on the differences in the log likelihood values (-2LL) between two models. Coefficients are
expressed in two forms which is original anf exponentiated to assist in interpretation.
Cluster Analysis
Cluster analysis is a group of multivariate techniques whose primary purpose is to group objects
based on the characteristics the possess. Cluster analysis is comparable to factor analysis in its objective
of assesing structure. Cluster analysis differs from factor analysis, however, in that cluster analysis group
object, whereas factor analysis is primarily concerns with grouping variables. The cluster variate
represents the mathematical representation of the selected set of variables which compares the objects
similarities. The variate in cluster analysis is determined quite differently from other multitvariate
techniques. Cluster analysis is the only multivariate technique that does not estimate the variate
empirically but instead uses the variate as specified by the researcher. The focus of the cluster analysis is
on the comparison of objects based on the variate, not on the estimation of the variate itself.

Elisabeth Brielin Sinu/ Data Analysis - 1st Project

III. Data Analysis


Factor analysis
In this session, I will analyse several data separately by using Factor analysis. Assume that
multivariate normal analysis is satisfied. Firstly, I will perform the analysis globally to the data which is
there are three kind of data. Those are, data of ITS lecturers that has scopus acccount, google scholar
account and those that have both account. I use SPSS software to run this analysis. Before starting to
analyse the data, I run the assumption testing first in order to obtain simple structure variables. After
completing the computation by using SPSS software, the output of the factor analysis can be provided in
the following:
i.
Factor Analysis in Scopus data of ITS Lecturers
Table 1.1 Rotated Component Matrixa
Variables
1

Component
2

umur
,982
lamakerja
,983
numberofcit
,949
citationbydoc
,968
firstauthorbydate
firstauthorbycitat
highestnumbercit
,921
Extraction Method: Principal Component Analysis.
Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 4 iterations.

,737
,877

According
to
table 1.1 above,
it can be seen
that from 8
variables that I
used in the
beginning,
variable number
of
document
should
be
removed from
the analysis because of the value of communality is less than 0,5. Communality represent the proportion
of the variance in the original variables that is accounted for by the factor solution. The factor solution
should explain at least half of each original variable's variance, so the communality value for each
variable should be 0.50 or higher. After I removed that variable, I repeated the analysis.
After all the requirement is satisfied, the data obtain 3 factors which the first factor contain
variable number of citation, citation by document, and highest number of citation per document. Second
factor contain variable age and working period. The third factor contain variable number of document as
first author that sorted by date and number of document sorted by citation. All the variables already is
simple structure variables. Below I provided matrix plot in order to support the output of factor analysis:

Elisabeth Brielin Sinu/ 1314201001


Matrix Plot of age1; workperiod; numbofcitat; citatbydoc; ...
10

20

30

50

100

10

60
50

age1

40
30
20

workperiod

10

100
50

numbofcitat

100
50

citatbydoc

0
4
2

sortedbydate

10
5

sortedbycitat

0
40
20

highestnumbofcitat

0
40

50

60

50

100

20

40

Figure 1.1 Matrix plot of age, working period, number of citation, citation by document, document as first
author sorted by date, document as first author sorted by citation, highest number of citation per
document.

ii.

Factor Analysis in Scopus data of Mathematics and Sciences Faculty Lecturers


Table 1.2 Rotated Component Matrixa
Component
1
2
age
,981
periodofworking
,984
numberofcit
,954
citationbydoc
,953
firstauthorbydate
,768
firstauthorbycitat
,832

Elisabeth Brielin Sinu/ Data Analysis - 1st Project

highestnumbercit
,898
Extraction Method: Principal Component
Analysis.
Rotation Method: Varimax with Kaiser
Normalization.
a. Rotation converged in 3 iterations.
According to table 1.2 above, it can be seen that from 8 variables that I used in the beginning,
variable number of document should be expelled because of the value of communality is less than 0,5.
After all the requirement is satisfied, the data obtain 2 factors which the first factor contain variable age
and working period while variable number of citation, citation by document, and highest number of
citation per document, number of document as first author that sorted by date and number of document
sorted by citation included in the second factor. All the variables already is simple structure variables.
Below I provided matrix plot in order to support the output of factor analysis:
Matrix Plot of age; period of wo; numberof cit; citat by doc; ...
15

20

25

10

30

50

1,6

2,4

3,2

48
44

age

40

25
20
period of working

15

60
40

numberof citation

20

50
30

citat by doc

10

4
3
first author by date

3,2
2,4

first author by citat

1,6

30
20

highest number of citat/ do

10
40

44

48

20

40

60

10

Matrix Plot of age1; period; num of cit; cit by doc; sorted date; ...
15

30

45

80

160

10

60
45

age1

30

45
30
period

15

200
100

num of cit

160
80

cit by doc

10
5

sorted date

10
5

sorted cit

80
40

highest numb of citat

0
30

45

60

100

200

10

40

80

20

30

Elisabeth Brielin Sinu/ 1314201001

Figure 1.2 matrix plot of variables with simple structure based on


lecturers in Mathematics and Sciences Faculty

iii.

Figure 1.3 matrix plot of variables with simple structure based on


department in Mathematics and Sciences Faculty

Factor Analysis in Google Scholar data of ITS Lecturers


Table 1.3 Rotated Component Matrixa
Component
1
2

age
,984
periodofworking
,981
allcitation
,902
numberofdoc
,695
englisdoc
,849
firstauthordoc
,732
highestnumofcitperdoc
,718
Extraction Method: Principal Component
Analysis.
Rotation Method: Varimax with Kaiser
Normalization.
a. Rotation converged in 3 iterations.
Based on the table 1.3 above, it can be seen that from 8 variables that I used in the beginning,
variable document in Indonesian should be excluded from the analysis because of the value of measure of
sampling adequacy (MSA) in anti image correlation matrices is less than 0,5. The value of MSA is
described unacceptable if it is below 0,5. After all the requirement is satisfied, the data obtain 2 factors
which the first factor include age and working period variable and the second factor consist of the number
of citation, number of document, document in english, document as first author and highest number of
citation per document variables. All the variables already is simple structure variables. Below I provided
matrix plot in order to support the output of factor analysis:

10

Elisabeth Brielin Sinu/ Data Analysis - 1st Project


Matrix Plot of age; workperiod; numberofallc; numberofdocu; ...
8

16

24

40

80

1000

1500

2000

48
42

age

36
24
16

workperiod

200
100

numberofallcitat

80
40

numberofdocument

20
10

english doc

2000
1500

first author document

1000
50
highest numn of citat per doc

25
0
36

iv.

42

48

100

200

10

20

25

50

Figure 1.4 matrix plot of variables with simple structure according Google
Scholar data of ITS Lecturers

Sciences Faculty Lecturers


Table 1.4 Rotated Component Matrixa
Component
1
2
age
,992
workingperiod
,986
numbofdoc
,957
inddoc
,909
firstauthordoc
,881
Extraction Method: Principal Component
Analysis.
Rotation Method: Varimax with Kaiser
Normalization.

Factor Analysis in
Google Scholar data
of Mathematics and

Elisabeth Brielin Sinu/ 1314201001

a. Rotation converged in 3 iterations.


According to the table 1.4 above, it shows that there are 5 variables that left from 8 variables that
I used in the beginning. In process, age, document in english, document in Indonesian, and highest
number of citation per document variable has MSA value less than 0,5 so that I removed the variable with
the smallest value. In this case, variable highest number of citation per document variable is removed
first. After removing the variable, the analysis was repeated and produce document in english and the
number of citation variable with value of MSA less than 0,5. The process repeated again which the
number of all citation was removed first followed by document in english variable. After all the
requirement is satisfied, the data obtain 2 factors which the first factor contain age and working period
variable while number of document, document in Indonesian, and document as first author variables
become the second factor. All the variables already is simple structure variables. Below I provided matrix
plot in order to support the output of factor analysis:
Matrix Plot of age; workperiod; numof doc; indo doc; first author doc
15

18

21

12

18

46
44

age

42
21
18

workperiod

15
32
24

numof doc

16

18
12

indo doc

10,0
7,5

first author doc

5,0
42

44

46

16

24

32

5,0

Matrix Plot of age1; workperiod1; numofdoc1; ind doc1; first author


0

20

40

40

80

60
40

age1

20

40
20

workperiod1

0
100
numofdoc1

50
0

80
40

ind doc1

0
40
20

first author doc1

0
20

40

60

50

100

20

40

7,5

10,0

12

Elisabeth Brielin Sinu/ Data Analysis - 1st Project

Figure 1.5 matrix plot of variables with simple structure


according to Lecturers in Mathematics and Sciences Faculty

v.

Factor Analysis of the ITS


Lecturers data based on Scopus
and Google Scholar Account.

Figure 1.6 matrix plot of variables with simple structure


according to the Department in Mathematics and Sciences
Faculty

Table 1.5 Rotated Component Matrixa


Component
1
2
numbofdoc_SC
,714
numbogcit_SC
,912
citbydoc_SC
,935
highestnumbercit_SC
,916
numbofcitat_GSC
,733
numbofdoc_GSC
,925
engdoc_GSC
,858
firstauthordoc_GSC
,785
highestcitatperdoc_GSC
,775
Extraction Method: Principal Component Analysis.
Rotation Method: Varimax with Kaiser
Normalization.
a. Rotation converged in 3 iterations.

The table 1.5 above shows us that there are 10 variables left that contained in 2 factors. In the
beginning I have 14 variables. While analysing, I removed document in Indonesian variable followed by
working period variable regarding to the value of Measure of Adequacy (MSA) in those variables was
less than 0,5. After repeating the analysis, age and the number of document as first author sorted by
citation variable has the communality value less than 0,5. I removed the number of document as first
author sorted by citation variable since this variable have smallest value. After repeated the analysis all
the communality value of remain variable already more than 0,5. Once any variables with communalities
less than 0.50 have been removed from the analysis, the pattern of factor loadings should be examined to

Elisabeth Brielin Sinu/ 1314201001

identify variables that have complex structure. Complex structure occurs when one variable has high
loadings or correlations (0.40 or greater) on more than one component.
If a variable has complex structure, it should be removed from the analysis. For that reason, age
variable was found to have complex structure. The variable should be removed and the principal
component analysis should be repeated. After all the requirement is satisfied, the data obtain 2 factors
which is Scopus number of citation, Scopus citation by document, highest number of citation in Scopus
and number of citation in GSC variable as the first factor while the second factor contain number of
document in Scopus, number of document in Google Scholar, English document in Google Scholar, and
document as first author in Google Scholar variable. All the variables already is simple structure
variables. Below I provided matrix plot in order to support the output of factor analysis:

Matrix Plot of numberofdoc_ ; numb of cita; citation by ; ...


0

80

160

100

200

10

20

50

100
30
20

numberofdoc_ SC

10

160
80

numb of citat_ SC

160
80

citation by doc_ SC

0
4
2

first author by date_ SC

0
50
25

highest numb of citat/ doc_ SC

0
200

numb of citat_ GSC

100
0

80
40

numb of doc_ GSC

0
20

eng doc_ GSC

10
0

30

first author doc_ GSC

15
0

100
50

highest numb/ doc_ GSC

10

20

30

80

160

25

50

40

80

15

30

14

Elisabeth Brielin Sinu/ Data Analysis - 1st Project

Figure 1.7 matrix plot of variables with simple structure based on ITS Lecturers data in
Scopus and Google Scholar Account.

vi.

Factor Analysis of the Mathematics and Sciences Faculty data based on Scopus and
Google Scholar Account.
1.6 Rotated Component Matrixa
Component
1
2

3
age
,963
periodofworking
,964
numbofdoc_SC
,745
numbogcit_SC
,960
citbydoc_SC
,964
firstauthorbydate_SC
,677
firstauthorbycitat_SC
,766
highestnumbercit_SC
,922
numbofdoc_GSC
,929
engdoc_GSC
,854
firstauthordoc_GSC
,839
Extraction Method: Principal Component Analysis.
Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 5 iterations.

Elisabeth Brielin Sinu/ 1314201001

The table 1.6 above shows us that there are 11 variables left that contained in 3 factors. In the
beginning I have 14 variables. While analysing, I removed document in Indonesian, number of citation in
GSC, highest number of citation per document in GSC variable regarding to the value of Measure of
Adequacy (MSA) in those variables was less than 0,5. After all the requirement is satisfied, the data
obtain 3 factors which is the first factor contains scopus variable, the second factor including number of
document in Scopus, number of document in GSC, english document in GSC and document as first
author in GSC while age and working period variable become the third factor. All the variables already is
simple structure variables. Below I provided matrix plot in order to support the output of factor analysis:

Matrix Plot of age1; perio; number of do; num of cit_ S; ...


10

20

30

100

200

10

40

80

50

100

20

40

60
45

age1

30

30
20
10

perio
40
20

number of doc_ SC

200
100

num of cit_ SC

160
80

citat by doc_ SC

10
5

sorted by date1

10
5

sorrted by citat1

80
40

highest citat_ SC

0
500

numbof citat_ GSC

250
0

100

num of doc_ GSC

50
0

50

ebg doc_ GSC

25
0

40
20

first author_ GSC

400
200

highest citat_ GSC

0
30

45

60

20

40

80

160

10

250 500

25

50

200

400

16

Elisabeth Brielin Sinu/ Data Analysis - 1st Project

Figure 1.8 matrix plot of variables with simple structure according to Lecturers in Mathematics
and Sciences Faculty

Matrix Plot of age; periodofwork; numboddoc_ SC; numofcit_ SC; ...


15

20

25

20

40

60

10

20

30

24

36

48

10

15

50
45

age

40

25
20
15

periodofworking
10, 0
7, 5

numboddoc_ SC

5, 0

60
40
20

numofcit_ SC
60
40

citatbydoc_ SC

20

4
3

firstauthorbydate_ SC

4
3

fistrauthrbycitat_ SC

30
20

highestnumofcitat_ SC

10

150
100
50

numb of citat_ GSC


48
36

numbof doc_ GSC

24

25
20

engdoc_ GSC

15

15
10

fistauthir_ GSC

90
60

hiighestcitat_ GSC

30
40

45

50

5, 0

7, 5

10,0

20

40

60

50

100 150

15

20

25

30

60

90

Elisabeth Brielin Sinu/ 1314201001

Figure 1.9 matrix plot of variables with simple structure according to the Department in
Mathematics and Sciences Faculty

Discriminant Analysis
In this session, I will briefly develop the performance of ITS lecturers based on index h value. As we
know, the more the contribution that a lecturer provide in academic level the higher the index h he/she
obtain. Therefore, in this part, the index h value will be classified into two groups which is index h less
than 2 represented by number 1 and index h more than equals to 2 represented by number 2. I assume that
there is no problem with missing data, violation of assumptions, or outliers. I use a level of significance of
0.05 for evaluating the statistical relationship and run the data by SPSS software. In order to determine
the most important Independent variable, the stepwise method is selected for choosing variable.
i.

Discriminant analysis of Scopus data of ITS Lecturers


According to table 1.7, it can be concluded that ITS lecturers with h-index in Scopus more
than equals to two are older than those who have h index less than 2, altough not significantly.
Similarly, the lecturers who have served longer at the university were more likely to have higher
h-index than those who are not. In contrast, the remain variables shows us the significant
differences between both of the groups. The group which contain ITS lecturers with h-index less
than 2 were more likely to have number of document, number of citation, citation by document,
and highest number of citation variable in Scopus fewer than the other group which contain
those who have h-index more than equals to 2.

18

Elisabeth Brielin Sinu/ Data Analysis - 1st Project

Table 1.7 Group Statistics


Mean
Std. Deviation

hindex

Total

umur
lamakerja
numberofdoc
numberofcit
citationbydoc
highestnumbercit
umur
lamakerja
numberofdoc
numberofcit
citationbydoc
highestnumbercit
umur
lamakerja
numberofdoc
numberofcit
citationbydoc
highestnumbercit

45,55
19,48
4,44
5,47
5,39
4,48
45,85
20,03
7,70
18,49
17,28
12,39
45,68
19,72
5,86
11,15
10,58
7,93

9,892
10,288
2,448
9,894
9,883
9,331
9,119
9,201
5,150
21,942
21,846
20,009
9,537
9,806
4,180
17,479
17,220
15,421

Valid N (listwise)
Unweighted Weighted
102
102,000
102
102,000
102
102,000
102
102,000
102
102,000
102
102,000
79
79,000
79
79,000
79
79,000
79
79,000
79
79,000
79
79,000
181
181,000
181
181,000
181
181,000
181
181,000
181
181,000
181
181,000

After a short descriptive analysis reported, the next stage is to implement the discriminant
analysis into the data. Discriminant analysis consists of two stages: in the first stage, the discriminant
functions are derived; in the second stage, the discriminant functions are used to classify the cases. There
are several steps in this analysis In order to find the significant model of discriminant analysis. By
running the data using SPSS software I can provide several output below while the complete output will
be attached in the last page of this paper. In this analysis, there are 181 valid cases and 6 independent
variables. The ratio of cases to independent variables is 30.66 to 1, which satisfies the minimum
requirement. Table 1.8 explains that the number of cases in the smallest group in this problem is 79,
which is larger than the number of independent variables (6), satisfying the minimum requirement. In
addition, the number of cases in the smallest group satisfies the preferred minimum of 20 cases.

In this analysis there were 1 groups defined by opinion h-index and 6 independent variables, so
the maximum possible number of discriminant functions was 1. In the table of Wilks' Lambda which
tested functions for statistical significance, the stepwise analysis identified 1 discriminant functions that
were statistically significant. The Wilks' lambda statistic for the test of function 1 (chi-square=76.530) had
a probability of 0.001 which was less than or equal to the level of significance of 0.05.

Test of Function(s)
1

Table 1.8 Wilks' Lambda


Wilks' Lambda Chi-square
,649
76,530

df
4

Sig.
,000

The Wilks' lambda statistic for the test of function 1 (chi-square=76.530) had a probability of 0.00 which
was less than or equal to the level of significance of 0.05. The significance of the maximum possible

Elisabeth Brielin Sinu/ 1314201001

number of discriminant functions supports the interpretation of a solution using 1 discriminant function.
In order to specify the role that each independent variable plays in predicting group membership on the
dependent variable, we must link together the relationship between the discriminant function and the
groups defined by the dependent variable, the role of the significant independent variables in the
discriminant functions, and the differences in group means for each of the variables.
Table 1.9 Functions
at Group Centroids
hindex
Function
1
1
-,644
2
,831
Unstandardized
canonical
discriminant
functions evaluated
at group means

Table 1.9 above explains that each function divides the groups into two subgroups by assigning negative
values to one subgroup and positive values to the other subgroup. Function 1 separates h-index value
which is less than 2 (-.644) from h-index value more than equals to 2 (.831).
Table 1.10 below shows us the most important predictor variables that already selected by
stepwise method. This method identified 4 variable that satisfied the level of significance of 0.05. The
most important predictor of groups based on responses to h-index value were number of document,
number of citation, citation by document, and hisghest number of document per citation.

Step

Entered

Table 1.10 Variables Entered/Removeda,b,c,d


Min. D Squared
Statistic
Between
Exact F
Groups
Statistic
df1
df2
,709 1 and 2
31,577
1
179,000
1,473 1 and 2
32,595
2
178,000
1,951 1 and 2
28,624
3
177,000

Sig.
numberofdoc
7,232E-008
numberofcit
8,674E-013
citationbydoc
3,871E-015
highestnumberc
4
2,175 1 and 2
23,800
4
176,000
9,535E-016
it
At each step, the variable that maximizes the Mahalanobis distance between the two closest groups is
entered.
a. Maximum number of steps is 12.
b. Maximum significance of F to enter is .05.
c. Minimum significance of F to remove is .10.
d. F level, tolerance, or VIN insufficient for further computation.
1
2
3

Based on the structure matrix in table 1.11 below, the predictor variable strongly associated with
discriminant function 1 which distinguished between lecturers with h-index values less than 2 and

20

Elisabeth Brielin Sinu/ Data Analysis - 1st Project

lecturers with h-index more than equals to 2 were number of document (r=0,571) and number of citation
(r=0,542).

Table 1.11 Structure Matrix


Function
1
numberofdoc
,571
numberofcit
,542
citationbydoc
,497
highestnumbercit
,359
lamakerjaa
,074
umura
,044
Pooled within-groups
correlations between
discriminating variables and
standardized canonical
discriminant functions
Variables ordered by absolute
size of correlation within
function.
a. This variable not used in the
analysis.
Table below shows us the coefficients of discriminant function which is the function will be used to
classify the cases.
Table 1.12 Standardized
Canonical Discriminant
Function Coefficients
Function
1
numberofdoc
,487
numberofcit
6,427
citationbydoc
-4,667
highestnumbercit
-1,241
According to the coefficient function, I can construct the sifnificant model of diccriminant function
below:
f ( x) 0, 487 document 6, 427citation 4, 667 citationbydoc 1, 241highest number of citation

(1)
After I obtain the discriminant function, in the next step I use the function to classify the cases.
Table 1.13 below ilustrates that the independent variables could be characterized as useful predictors of
membership in the groups defined by the dependent variable if the cross-validated classification accuracy
rate was significantly higher than the accuracy attainable by chance alone. Operationally, the cross-

Elisabeth Brielin Sinu/ 1314201001

validated classification accuracy rate should be 25% or higher than the proportional by chance accuracy
rate. The proportional by chance accuracy rate was computed by squaring and summing the proportion of
cases in each group from the table of prior probabilities for groups (table below) (0.564 + 0.436 =
0.5081).
Table 1.13 Prior Probabilities for Groups
hindex
Prior
Cases Used in Analysis
Unweighted
Weighted
1
,564
102
102,000
2
,436
79
79,000
Total
1,000
181
181,000

Table 1.14 Classification Resultsa,c


Predicted Group Membership
Total
hindex
1
2
1
97
5
102
Count 2
33
46
79
Ungrouped cases
59
80
139
Original
1
95,1
4,9
100,0
%
2
41,8
58,2
100,0
Ungrouped cases
42,4
57,6
100,0
1
97
5
102
Count
2
34
45
79
Cross-validatedb
1
95,1
4,9
100,0
%
2
43,0
57,0
100,0
a. 79,0% of original grouped cases correctly classified.
b. Cross validation is done only for those cases in the analysis. In cross validation, each case is
classified by the functions derived from all cases other than that case.
c. 78,5% of cross-validated grouped cases correctly classified.

22

Elisabeth Brielin Sinu/ Data Analysis - 1st Project

Table 1.14 above explains that cross-validated accuracy rate computed by SPSS was 78,5%
which was greater than or equal to the proportional by chance accuracy criteria of 71.4% (1.25 x 50.81%
= 63,52%). The criteria for classificationn table and accuracy is satisfied.
In conclusion, We found one statistically significant discriminant function in equation (1),
making it possible to distinguish among the two groups defined by the dependent variable. Moreover, the
cross-validated classification accuracy surpassed the by chance accuracy criteria, supporting the utility of
the model.

ii.

Discriminant Analysis in Scopus data of Mathematics and Sciences Faculty Lecturers


The table 1.8 below ilustrates the different group of h-index which contain the same variables of
Scopus lecturers data in Mathematics and Sciences Faculty. It is evident that age and work length of
lecturers in Mathematics and Sciences Faculty has different output compared to the overall lecturers in
ITS. Interestingly, lecturers that has h-index < 2 were more likely to be older and have longer length of
work than those with h-index 2. In contrast, lecturers with h-index less than 2 were more likely to have
number of document, number of citation, citation by document, and highest number of citation variable in
Scopus fewer than those who have h-index more than equals to 2.

Table 1.15 Group Statistics


Mean
Std. Deviation

hindex

Total

age
periodofworking
numberofdoc
numberofcit
citationbydoc
highestnumbercit
age
periodofworking
numberofdoc
numberofcit
citationbydoc
highestnumbercit
age
periodofworking
numberofdoc
numberofcit
citationbydoc

50,00
25,41
3,65
5,71
5,65
4,76
42,65
18,00
8,00
18,50
17,45
10,95
46,03
21,41
6,00
12,62
12,03

7,237
6,718
1,801
5,621
5,645
4,070
7,506
7,712
4,193
16,728
16,513
10,758
8,173
8,091
3,944
14,266
13,915

Valid N (listwise)
Unweighted Weighted
17
17,000
17
17,000
17
17,000
17
17,000
17
17,000
17
17,000
20
20,000
20
20,000
20
20,000
20
20,000
20
20,000
20
20,000
37
37,000
37
37,000
37
37,000
37
37,000
37
37,000

Elisabeth Brielin Sinu/ 1314201001

highestnumbercit

8,11

8,844

37

37,000

In this first stage I will apply the data of lecturers h-index in Mathematics and Sciences Faculty
to find the discriminant function. By following the steps similar to (i), the analysis obtain that the number
of citation, the number of document and length of work as the most important independent variables
while the remain variable is removed based upon the stepwise method. I also found that the predictor
variable strongly associated with discriminant function 1 which distinguished between lecturers with hindex values less than 2 and lecturers with h-index more than equals to 2 was number of document (r =
546). The coefficient of the Discriminant Function can be seen in the following table:
Based on the table above, I can construct the Discriminant function below:
f ( x) 0, 665length of work 8,10document 0, 664citation

(2)
After obtaining the discriminant function, in the next stage I use the function to classify the
cases. The proportional by chance accuracy rate was computed by squaring and summing the proportion
of cases in each group from the table of prior probabilities for groups (table below) (0,459 + 0,541 =
0,5033). From the SPSS output in table below explains that cross-validated accuracy rate computed by
SPSS was 91,9% which was greater than the proportional by chance accuracy criteria of 62,9% (1,25 x
50,33% = 62,9%). The criteria for classification table and accuracy is satisfied.

Table 1.16 Classification Resultsa,c


Predicted Group Membership
hindex
1
2
1
17
0
Count 2
3
17
Ungrouped cases
8
17
Original
1
100,0
,0
%
2
15,0
85,0
Ungrouped cases
32,0
68,0
1
17
0
Count
2
3
17
Cross-validatedb
1
100,0
,0
%
2
15,0
85,0
a. 91,9% of original grouped cases correctly classified.

Total
17
20
25
100,0
100,0
100,0
17
20
100,0
100,0

24

Elisabeth Brielin Sinu/ Data Analysis - 1st Project

b. Cross validation is done only for those cases in the analysis. In cross validation, each case is
classified by the functions derived from all cases other than that case.
c. 91,9% of cross-validated grouped cases correctly classified.
To sum up, We already found one statistically significant discriminant function in equation (2),
making it possible to distinguish among the two groups defined by the dependent variable. Moreover, the
cross-validated classification accuracy surpassed the by chance accuracy criteria, supporting the utility of
the model.

iii.

Discriminant Analysis of the ITS Lecturers data based on Google Scholar Account.
Based on the table 1.17 below, overall, it can be seen that age and length of work are not
significantly different between groups. On the other hand, the remain variables have quite different mean
between group. On average, the Lecturers of ITS who have higher number of citation, number of
document, english document, indonesian document and highest number of citation per document in
Google Scholar Account were more likely to have higher h-index value.
1.17 Group Statistics
all_hindex
Mean
Std. Deviation
Valid N (listwise)
Unweighted
Weighted
age
43,91
10,862
111
111,000
periodofworking
17,12
10,817
111
111,000
allcitation
6,85
23,183
111
111,000
1
numberofdoc
12,33
9,932
111
111,000
englisdoc
4,55
4,094
111
111,000
indodoc
7,78
8,630
111
111,000
highestnumofcitperdoc
5,72
23,218
111
111,000
age
43,82
9,652
96
96,000
periodofworking
17,75
10,198
96
96,000
allcitation
14,56
11,767
96
96,000
2
numberofdoc
22,56
23,841
96
96,000
englisdoc
9,46
7,891
96
96,000
indodoc
13,09
21,470
96
96,000
highestnumofcitperdoc
8,21
9,839
96
96,000
age
43,87
10,294
207
207,000
periodofworking
17,41
10,514
207
207,000
allcitation
10,43
19,124
207
207,000
Total
numberofdoc
17,08
18,465
207
207,000
englisdoc
6,83
6,610
207
207,000
indodoc
10,25
16,106
207
207,000
highestnumofcitperdoc
6,87
18,277
207
207,000
By applying the data of ITS lecturers based on Google Scholar account and using the similar step
of analysis before, I obtain the most important independent variables defined by dependent variable that
met the statistical test for inclusion. Those variables are the number of citation and highest number of
citation per document. Between these independent variables, the number of citation variable was strongly
associated with discriminant function 1 which distinguished between lecturers with h-index values less

Elisabeth Brielin Sinu/ 1314201001

than 2 and lecturers with h-index more than equals to 2. The coefficient of discriminant function shows in
the table below:
Table 1.18 Standardized Canonical
Discriminant Function Coefficients
Function
1
allcitation
6,735
highestnumofcitperdoc
-6,588
Based on the table above, the discriminant function can be written in the following:

f ( x ) 6, 735citation 6, 588highest numberof citation

(3)

according to obtained discriminant function, the classified cases can be seen in the following table:
Table 1.19 Classification Resultsa,c
Predicted Group Membership
Total
all_hindex
1
2
1
103
8
111
Count 2
23
73
96
Ungrouped cases
93
171
264
Original
1
92,8
7,2
100,0
%
2
24,0
76,0
100,0
Ungrouped cases
35,2
64,8
100,0
1
103
8
111
Count
2
24
72
96
Cross-validatedb
1
92,8
7,2
100,0
%
2
25,0
75,0
100,0
a. 85,0% of original grouped cases correctly classified.
b. Cross validation is done only for those cases in the analysis. In cross validation, each case is
classified by the functions derived from all cases other than that case.

26

Elisabeth Brielin Sinu/ Data Analysis - 1st Project

c. 84,5% of cross-validated grouped cases correctly classified.


Because independent variables could be characterized as useful predictors of membership in the
groups defined by the dependent variable if the cross-validated classification accuracy rate was
significantly higher than the accuracy attainable by chance alone which the cross-validated classification
accuracy rate should be 25% or higher than the proportional by chance accuracy rate. The proportional by
chance accuracy rate was computed by squaring and summing the proportion of cases in each group from
the table of prior probabilities for groups (table below) (0,536 + 0,464 = 0,502). From the SPSS output
in table below explains that cross-validated accuracy rate computed by SPSS was 91,9% which was
greater than the proportional by chance accuracy criteria of 62,8% (1,25 x 50,2% = 62,8%). The criteria
for classification table and accuracy is satisfied.

Table 1.20 Prior Probabilities for Groups


all_hindex
Prior
Cases Used in Analysis
Unweighted
Weighted
1
,536
111
111,000
2
,464
96
96,000
Total
1,000
207
207,000
Finally, We already found one statistically significant discriminant function in equation (3),
making it possible to distinguish among the two groups defined by the dependent variable. In the
meantime, our model is useful because the cross-validated accuracy rate produced by SPSS to 25% more
than the proportional by chance accuracy.

iv.

Discriminant Analysis in Google Scholar data of Mathematics and Sciences Faculty Lecturers
The table below shows us that in google scholar account, lecturers with younger age and work
length fewer than another lecturers are included in group 2, although not significantly. On the other hand,
on average, the Lecturers of Mathematics and Sciences Faculty who have lower number of citation,
number of document, english document, indonesian document and highest number of citation per
document in Google Scholar Account were more likely to be included in group 1.
1.21 Group Statistics
hindex
Mean
Std. Deviation
Valid N (listwise)
Unweighted
Weighted
age
45,70
10,254
27
27,000
workingperiod
20,15
10,220
27
27,000
allcitation
4,11
2,887
27
27,000
numbofdoc
12,52
12,819
27
27,000
1
englishdoc
3,26
2,969
27
27,000
inddoc
9,26
11,749
27
27,000
highestnumbofcitatper
2,89
2,391
27
27,000
doc
2
age
44,74
9,473
23
23,000
workingperiod
18,35
9,810
23
23,000
allcitation
14,65
12,160
23
23,000

Elisabeth Brielin Sinu/ 1314201001

Total

numbofdoc
englishdoc
inddoc
highestnumbofcitatper
doc
age
workingperiod
allcitation
numbofdoc
englishdoc
inddoc
highestnumbofcitatper
doc

20,26
10,09
10,13

19,017
12,124
10,687

23
23
23

23,000
23,000
23,000

8,26

10,136

23

23,000

45,26
19,32
8,96
16,08
6,40
9,66

9,814
9,972
9,949
16,272
9,082
11,168

50
50
50
50
50
50

50,000
50,000
50,000
50,000
50,000
50,000

5,36

7,515

50

50,000

In this part, the data of lecturers in Mathematical and Sciences Faculty based on Google Scholar
Account will be applied to the analysis. The important independent variables obtained by implementing
stepwise method are the number of citation, number of document and highest number of citation per
document. Among those independent variables, the number of citation was strongly associated with
discriminant function 1 which distinguished between lecturers with h-index values less than 2 and
lecturers with h-index more than equals to 2. After the significant variables obtained, I construct the
function discriminant based on the coefficient of discriminant function below:

Table 1.22 Standardized Canonical


Discriminant Function Coefficients
Function
1
allcitation
-5,275
numbofdoc
,777
highestnumbofcitatperdoc
4,714
So that, the discriminant function can be written in the following equation:
f ( x ) 5, 275citation 7, 777 document 4, 714highest numberof citation

(4)

Based on the discriminant function, classified cases can be showed in the table below. The
proportional by chance accuracy rate was computed by squaring and summing the proportion of cases in
each group from the table of prior probabilities for groups (0,540 + 0,460 = 0,5033). The value of
proportional by chance accuracy compared to the SPSS output in table below which explains that crossvalidated accuracy rate computed by SPSS was 91,9% which was greater than the proportional by chance
accuracy criteria of 62,9% (1,25 x 50,33% = 62,9%). So that the criteria for classification table and
accuracy is satisfied.
Table 1.23 Classification Resultsa,c
Predicted Group Membership
Total
hindex
1
2
Original
Count 1
25
2
27
2
2
21
23

28

Elisabeth Brielin Sinu/ Data Analysis - 1st Project

Ungrouped cases
19
36
55
1
92,6
7,4
100,0
%
2
8,7
91,3
100,0
Ungrouped cases
34,5
65,5
100,0
1
25
2
27
Count
2
5
18
23
Cross-validatedb
1
92,6
7,4
100,0
%
2
21,7
78,3
100,0
a. 92,0% of original grouped cases correctly classified.
b. Cross validation is done only for those cases in the analysis. In cross validation, each case is
classified by the functions derived from all cases other than that case.
c. 86,0% of cross-validated grouped cases correctly classified.

Finally, We already found one statistically significant discriminant function in equation (4),
making it possible to distinguish among the two groups defined by the dependent variable. Moreover, the
cross-validated classification accuracy surpassed the by chance accuracy criteria, supporting the utility of
the model.

v.

Discriminant Analysis of the ITS Lecturers data based on Scopus and Google Scholar Account.
The table below explains that, on average, lecturers of ITS with younger age were more likely
have lower h-index Scopus, although not singnificantly. However, the other variables shows us significant
difference between groups such as length of work, number of citation and document in scopus, google
scholar number of citation, english document, and number of document. Being lecturers with high value
of h-index were more likely to have higher number of significant variables above.
Table 1.24 Group Statistics
hindex_SC
Mean
Std. Deviation
Valid N (listwise)
Unweighted Weighted
numbogcit_SC
5,49
6,292
61
61,000
numbofcitat_GSC
34,52
87,459
61
61,000
engdoc_GSC
13,07
10,136
61
61,000
1
age
44,61
9,126
61
61,000
periodofworking
18,87
9,772
61
61,000
numbofdoc_SC
5,03
3,376
61
61,000
numbofdoc_GSC
27,21
25,496
61
61,000
numbogcit_SC
19,22
23,853
55
55,000
numbofcitat_GSC
74,76
137,264
55
55,000
engdoc_GSC
21,07
16,174
55
55,000
2
age
45,95
8,603
55
55,000
periodofworking
20,05
9,316
55
55,000
numbofdoc_SC
8,56
5,779
55
55,000
numbofdoc_GSC
44,05
48,525
55
55,000
Total
numbogcit_SC
12,00
18,309
116
116,000
numbofcitat_GSC
53,60
115,088
116
116,000

Elisabeth Brielin Sinu/ 1314201001

engdoc_GSC
16,86
13,876
116
116,000
age
45,24
8,869
116
116,000
periodofworking
19,43
9,535
116
116,000
numbofdoc_SC
6,71
4,976
116
116,000
numbofdoc_GSC
35,20
38,938
116
116,000
In this session, the discriminant analysis will be applied to the data of ITS Lecturers based on
Scopus and Google Scholar Account. SPSS output produce the useful variables obtained by implementing
stepwise method are number of citation in Scopus and number if document in scopus. I limit the
interpretation of independent variables to those listed variables above. Between both variables, the
predictor variable that strongly associated with discriminant function 1 which distinguished between
lecturers with h-index values less than 2 and lecturers with h-index more than equals to 2 was number
citation in Scopus (r = 0,697). After the significant variables obtained, I construct the function
discriminant based on the coefficient of discriminant.

f ( x ) 0, 759citation 0, 719document So that the equation of discriminant function can be written in


the form:
(5)
The next part, by obtained discriminant analysis, the case can be classified accurately. The result
of proportional by chance accuracy rate was 0,501. The value of proportional by chance accuracy
compared to the SPSS output in table below which explains that cross-validated accuracy rate computed
by SPSS was 75,9% which was greater than the proportional by chance accuracy criteria of 62,6% (1,25 x
50,11% = 62,6%). The result of classification can be seen in table below:
Table 1.26 Classification Resultsa,c
Predicted Group Membership
Total
hindex_SC
1
2
1
55
6
61
Count 2
22
33
55
Ungrouped cases
41
66
107
Original
1
90,2
9,8
100,0
%
2
40,0
60,0
100,0
Ungrouped cases
38,3
61,7
100,0
1
55
6
61
Count
2
22
33
55
Cross-validatedb
1
90,2
9,8
100,0
%
2
40,0
60,0
100,0
a. 75,9% of original grouped cases correctly classified.
b. Cross validation is done only for those cases in the analysis. In cross validation, each case is
classified by the functions derived from all cases other than that case.
c. 75,9% of cross-validated grouped cases correctly classified.
In Counclusion, the model is significant based on , the cross-validated classification accuracy
surpassed the by chance accuracy criteria. Beside, we already found one statistically significant
discriminant function in equation (5), making it possible to distinguish among the two groups defined by
the dependent variable.

30

Elisabeth Brielin Sinu/ Data Analysis - 1st Project

vi. Discriminant Analysis of the Mathematics and Sciences Faculty data based on Scopus and Google
Scholar Account.
The table below concludes that scopus number of document and citation, Google Scholar
number of document and citation give a major impact to the value of h-index. Lecturers of Mathematical
and Sciences Faculty who has h-index 2 were more likely to have higher number of scopus document
and citation and also Google Scholar document and citation.

hindex_SC

Total

numbofdoc_SC
numbogcit_SC
numbofcitat_GSC
numbofdoc_GSC
numbofdoc_SC
numbogcit_SC
numbofcitat_GSC
numbofdoc_GSC
numbofdoc_SC
numbogcit_SC
numbofcitat_GSC
numbofdoc_GSC

1.27 Group Statistics


Mean
Std. Deviation
3,90
6,80
98,30
32,30
8,44
19,44
53,75
38,94
6,69
14,58
70,88
36,38

2,234
7,052
208,137
22,691
4,381
18,301
39,616
31,097
4,287
16,068
130,483
27,864

Valid N (listwise)
Unweighted Weighted
10
10,000
10
10,000
10
10,000
10
10,000
16
16,000
16
16,000
16
16,000
16
16,000
26
26,000
26
26,000
26
26,000
26
26,000

In this first stage I will apply the data of lecturers in Mathematics and Sciences Faculty based on
Scopus and Google Scholar account to find the discriminant function. By following the steps similar to
another data, the analysis obtain that the number of citation in Scopus and number of document in Scopus
are useful independent variables based upon stepwise method. I limit the interpretation of independent
variables to both listed variables above. I also found that the predictor variable strongly associated with
discriminant function 1 which distinguished between lecturers with h-index values less than 2 and
lecturers with h-index more than equals to 2 was number of document in Scopus (r = 744). After the
significant variables obtain, I will construct the discriminant function defined bt the coefficient of the
Discriminant Function which can be seen in the following table:
Table 1.28 Standardized
Canonical Discriminant
Function Coefficients

numbofdoc_SC
numbogcit_SC

Function
1
,877
,682

From the table above, the function can be written as follows:

f ( x ) 0, 877 document 0, 682document (6)

Elisabeth Brielin Sinu/ 1314201001

In the second stage, I will classify the cases. Because independent variables could be
characterized as useful predictors of membership in the groups defined by the dependent variable if the
cross-validated classification accuracy rate was significantly higher than the accuracy attainable by
chance alone which the cross-validated classification accuracy rate should be 25% or higher than the
proportional by chance accuracy rate. Therefore, The proportional by chance accuracy rate was computed
by squaring and summing the proportion of cases in each group from the table of prior probabilities for
groups (table below) (0,385 + 0,615 = 0,526). From the SPSS output in table below explains that crossvalidated accuracy rate computed by SPSS was 73,1% which was greater than the proportional by chance
accuracy criteria of 65,8% (1,25 x 52,6% = 65,8%). The criteria for classification table and accuracy is
satisfied.

Table 1.29 Classification Resultsa,c


Predicted Group Membership
Total
hindex_SC
1
2
1
8
2
10
Count 2
4
12
16
Ungrouped cases
6
15
21
Original
1
80,0
20,0
100,0
%
2
25,0
75,0
100,0
Ungrouped cases
28,6
71,4
100,0
1
7
3
10
Count
2
4
12
16
Cross-validatedb
1
70,0
30,0
100,0
%
2
25,0
75,0
100,0
a. 76,9% of original grouped cases correctly classified.
b. Cross validation is done only for those cases in the analysis. In cross validation, each case is
classified by the functions derived from all cases other than that case.
c. 73,1% of cross-validated grouped cases correctly classified.
In Counclusion, the model is significant based on the cross-validated classification accuracy
surpassed the by chance accuracy criteria. Beside, we already found one statistically significant
discriminant function in equation (5), making it possible to distinguish among the two groups defined by
the dependent variable.

Logistic Regression
In this part, I will use the same data to find the function of logistic regression model. Before
starting the analysis, Assume that there is no problem with missing data, outliers, or influential cases, and

32

Elisabeth Brielin Sinu/ Data Analysis - 1st Project

that the validation analysis will confirm the generalizability of the results. Use a level of significance of
0.05 for evaluating the statistical relationship.
i.
Logistic Regression of ITS Lecturers data based on Scopus
According to the output variable in equation using SPSS, I construct the logistic regression
model below. By using stepwise method I can obtain the most useful of important predictor. After
running based upon stepwise method, the important independent variables predictor for
distinguishing between groups based on responses to the number of lecturers that have more hindex than the other are number of document, number of citation, citation by document and highest
number per document.

e 3,8780,321x1 1,600 x2 0,245 x3 1,278 x4


1 e 3,8780,321x1 1,600 x2 0,245 x3 1,278 x4k
(7)

To characterize that model as useful, I compare the overall percentage accuracy rate produced by SPSS at
the last step in which variables are entered to 25% more than the proportional by chance accuracy.

Table 1.30 Classification Tablea


Predicted
hindexnew
Percentage
Observed
1,00
2,00
Correct
1,00
147
10
93,6
hindexnew
Step 4
2,00
22
139
86,3
Overall Percentage
89,9
a. The cut value is ,500
SPSS reports the overall accuracy rate in the footnotes to the table "Classification Table." The overall
accuracy rate computed by SPSS was 89,9%.
Table 1.31 Classification Tablea,b
Predicted
hindexnew
Percentage
Observed
1,00
2,00
Correct
1,00
0
157
,0
hindexnew
Step 0
2,00
0
161
100,0
Overall Percentage
50,6
a. Constant is included in the model.
b. The cut value is ,500
The proportional by chance accuracy rate was computed by calculating the proportion of cases for each
group based on the number of cases in each group in the classification table at Step 0, and then squaring
and summing the proportion of cases in each group (0,493 + 0,506 = 0,500).
157/ 318 = 0,493; 161 / 318 = 0.506

Elisabeth Brielin Sinu/ 1314201001

The proportional by chance accuracy criteria is 62,5% (1.25 x 50% = 62,5%). The accuracy rate
computed by SPSS was 89,9% which was greater than the proportional by chance accuracy criteria of is
62,5% (1.25 x 50% = 62,5%). The criteria for classification accuracy is satisfied.

In Conclusion, the model is significant based on the proportional by chance criteria and the
accuracy rate computed by SPSS.. Beside, We found a statistically significant overall relationship
between the combination of independent variables and the dependent variable.There was no evidence of
numerical problems in the solution.
ii.

Logistic Regression of Mathematics and Sciences Faculty Lecturers data based on


Scopus
Based upon stepwise method, the most useful independent variables for distinguishing
between groups based on responses to the number of lecturers that have more h-index than the
other are length of work, number of document and number of citation. From these
independent variables, the logistic regression model become:

e 1,0432,48 x10,775 x2 0,257 x2


1 e

1,043 2,48 x1 0,775 x2 0,257 x2

(8)
None of the independent variables has standard error greater than 2 so that there is no violation
regarding to multicollinearity. To characterize that model as useful, similar to (i) consider the table below:
Table 1.32 Classification Tablea
Observed
Predicted
hindexnew
Percentage
Correct
1,00
2,00
1,00
23
2
92,0
hindexnew
Step 3
2,00
2
35
94,6
Overall Percentage
93,5
a. The cut value is ,500
SPSS reports the overall accuracy rate in the footnotes to the table "Classification Table." The
overall accuracy rate computed by SPSS was 93,5%. The proportional by chance accuracy rate was
computed by calculating the proportion of cases for each group based on the number of cases in each
group in the classification table at Step 0, and then squaring and summing the proportion of cases in each
group (0,403 + 0,596 = 0,518). The proportional by chance accuracy criteria is 64,8% (1.25 x 51,8% =
64,8%). The accuracy rate computed by SPSS was 93,3% which was greater than the proportional by
chance accuracy criteria of is 64,8% (1.25 x 51,8% = 64,8%).The criteria for classification accuracy is
satisfied.
Finally, We found a statistically significant overall relationship between the predictor
independent variables and the dependent variable. Moreover, the classification accuracy surpassed the
proportional by chance accuracy criteria, supporting the utility of the model.

34

iii.

Elisabeth Brielin Sinu/ Data Analysis - 1st Project

Logistic Regresion of the ITS Lecturers data based on Google Scholar Account
Based on the output SPSS, the most important independent variables for distinguishing between
groups based on responses to the number of lecturers that have more h-index than the other are the
number of citation, indonesian document, and highest number of citation per document. teoritically, the
more a lecturer contribute in academic level then the higher index value. By obtaining the important
independent variable, I construct the logistic regression model below.

e 2,971,055 x1 0,047 x2 1,046 x2


1 e

2,97 1,055 x1 0,047 x2 1,046 x2

(8)

SPSS reports the overall accuracy rate in the footnotes to the table "Classification Table." The overall
accuracy rate computed by SPSS was 94,3%.

Table 1.33 Classification Tablea


Observed
Predicted
hindexnew
Percentage
Correct
1,00
2,00
1,00
195
8
96,1
hindexnew
Step 5
2,00
19
249
92,9
Overall Percentage
94,3
a. The cut value is ,500

The proportional by chance accuracy rate was computed by calculating the proportion of cases
for each group based on the number of cases in each group in the classification table at Step 0, and then
squaring and summing the proportion of cases in each group (0,430 + 0,569 = 0,509).The proportional
by chance accuracy criteria is 63,6% (1.25 x 50,9% = 63,6%). The accuracy rate computed by SPSS was
94,3% which was greater than the proportional by chance. The criteria for classification accuracy is
satisfied.
In Conclusion, the model is significant based on the proportional by chance criteria and the
accuracy rate computed by SPSS.. Beside, We found a statistically significant overall relationship
between the combination of independent variables and the dependent variable.

Elisabeth Brielin Sinu/ 1314201001

iv.

Logistic regression of the Mathematics and Sciences Faculty Lecturers data based on Google
Scholar Account.
Based on the SPSS output, the most useful independent variable based upon stepwise method is
document in english variable. Therefore, the logistic model can be written as:

e 2,127 0,403 x1
1 e

e2,127 0,403 x1

(9)

SPSS reports the overall accuracy rate in the footnotes to the table "Classification Table." The
overall accuracy rate computed by SPSS was 81,9%. The proportional by chance accuracy criteria is
63,4% (1.25 x 50,7% = 63,4%). The accuracy rate computed by SPSS was 81,9% which was greater than
the proportional by chance. The criteria for classification accuracy is satisfied.
In Conclusion, the model is significant based on the proportional by chance criteria and the
accuracy rate computed by SPSS.

v.

Logistic regression of the ITS Lecturers data based on Scopus and Google Scholar Account.
Based on the SPSS output, the most useful independent variable based upon stepwise method are
number of document in scopus, number of citation in scopus and citation by document in scopus.
Therefore, the logistic model can be written as:

e 3,6640,257 x11,535 x2 1,365 x3


1 e

3,664 0,257 x1 1,535 x2 1,365 x3

(10)

SPSS reports the overall accuracy rate in the footnotes to the table "Classification Table." The
overall accuracy rate computed by SPSS was 87,9%. The proportional by chance accuracy criteria is
63,0% (1.25 x 50,4% = 63,0%). The accuracy rate computed by SPSS was 87,9% which was greater than
the proportional by chance. The criteria for classification accuracy is satisfied.
In Conclusion, the model is significant based on the proportional by chance criteria and the
accuracy rate computed by SPSS.

vi.

Logistic regression of the Mathematics and Sciences Faculty data based on Scopus and Google
Scholar Account.

36

Elisabeth Brielin Sinu/ Data Analysis - 1st Project

Based on the SPSS output, the most useful independent variable based upon stepwise method is
document in english variable. Therefore, the logistic model can be written as:

e 49,71717,445 x118,025 x2 26,64 x3 12,022 x4 9,825 x5 1,876 x6


1 e

49,717 17,445 x1 18,025 x2 26,64 x3 12,022 x4 9,825 x5 1,876 x6

(11)

SPSS reports the overall accuracy rate in the footnotes to the table "Classification Table." The
overall accuracy rate computed by SPSS was 100 %. The proportional by chance accuracy criteria is
68,8% (1.25 x 55,09% = 68,8%). The accuracy rate computed by SPSS was 100 % which was greater
than the proportional by chance. The criteria for classification accuracy is satisfied.
In Conclusion, the model is significant based on the proportional by chance criteria and the
accuracy rate computed by SPSS.

Cluster analysis
In this session, I will implement the cluster analysis to each of the data. The objective of cluster
analysis is to group objects based on the characteristics they possess. By considering some variables, I
intend to classify the lecturer based on the similarity of their functional position and education level. The
variable that I used to obtain the result are consist of age, length of work, number of citation, number of
document, citation by document, highest number of citation per document, english document, and
indonesian document. I assume that there is no problem with missing data, violation of assumptions, or
outliers.
Functional
Percentage of Position Functional in each cluster
1 (%)
2 (%)
3 (%)
4 (%)
Position
Professor
25
19,85
0
42,85
Associate
33,33
25,53
0
14,285
Professor
Lector
29,16
35,46
100
21,42
Expert
12,5
0,001
0
21,42
Assistant
Nonhierarchial clustering methods are preferred because the number of cluster is known. Beside, this
methods generally are less susceptible to outliers.
i.
Cluster analysis of ITS Lecturers data based on Scopus

Elisabeth Brielin Sinu/ 1314201001

By determining that number of cluster is 4, I will grouped the data based on functional position.
Beside that, each cluster will be identified regarding to the percentage of functional position. After
running the data using SPSS, the output can be summarized in the following table:
Table 1.32
According to the table above, it can be seen that the percentage of functional position tend to
spread in each cluster. So that, there is no possibility to decide the title of each cluster. In detail,
eventough in cluster 1 position as Associate professor is dominant, but not significantly different to
another functional position. The same thing happen to the other cluster. in addition, eventough cluster 3
only contain one position,but if we take a look in the figure below, only one cases included in that cluster.

Figure 1.10 bar chart of cluster number of


case based on functional position

Figure 1.10 bar chart of cluster number of


case based on functional position

When I separate the case of expert assistant with the lecturer who do not have functional position and
reanalyse with 5 determined as number of cluster then the result can be seen in figure 1.10. in detail, the
percentage is presented in table below:
Functional
Percentage of Position Functional in each cluster
1
2 (%)
3 (%)
4 (%)
5
Position
(%)
(%)
Professor
25
100 30,43478
19,40299
0
Associate Professor
50
0 30,43478
24,62687
0
Lector
25
0 28,26087
35,8209
50
Expert Assistant
0
0 10,86957
19,40299
50
Do not have functional position
0
0
0

38

Elisabeth Brielin Sinu/ Data Analysis - 1st Project

In case level of education, I determine the number of cluster as much as 2 cluster. after computing the
Level of
Percentage of
data, cluster 1 represent
lecturers that carry
Education
Education Level in each
doctoral degree and
otherwise. The result can
cluster
be shown as follow
1 (%)
2 (%)
Doctor
90
1,030928
Not Doctor
10
98,96907

The lecturers who carry doctoral degree were most likely to be dominant in the first cluster while
the second cluster mostly containing the other lecturers as a dominant. The diffference between each cases
shows us significant percentage. In detail, the percentage of lecturers who have doctoral degree was 90%
which far higher than the percentage of lecturers who doesnt have doctoral degree (10%). Otherwise,
98% cases of lecturers that dont have doctoral degree take place in second cluster. The table above can
be ilustrated in the following bar chart:

Figure 1.11 bar chart of cluster number of


case based on Education Level

ii.

Cluster analysis of Mathematics and Sciences Faculty Lecturers data based on Scopus

Similar to (i), I will use the different data to run the analysis using SPSS. By determining 4 as
number of cluster, I will grouped the data based on functional position. Beside that, each cluster will be
identified regarding to the percentage of functional position. The output can be summarized in the
following table:

Elisabeth Brielin Sinu/ 1314201001

Functional
Position
Professor
Associate Professor

Percentage of Position Functional in each cluster


1 (%)
2 (%)
3 (%)
4 (%)
25
44,44444
0
66,66667
20,8333
37,5
33,33333
3
0
Lector
54,1666
25
22,22222
7
33,33333
Expert Assistant
12,5
0
25
0
Overall, it can be seen that the proportion of the similar case do not show the significant
difference. associate professor dominates in cluster 1 , allthough not significantly. In cluster 2 and 4,
position as professor took 44,4% in cluster 1 and 66,67% in cluster 2. Position as lector also dominate
(54,16%) in cluster 3. The lowest part of cases provided by expert assistant which only took 12,5% of
cluster 1 and 25% of cluster 3. To ilustrate table above, I built the bar chart (fig. 1.12) below:

Figure 1.12 bar chart of cluster number of


case based on functional position

Figure 1.13 bar chart of cluster number of


case based on Education Level

The table below explains us that the lecturers who carry doctoral degree were took all part
(100%) in cluster 1. In contrast, cluster 2 contain most of the lecturer who do not own doctoral degree
(78%).
Level of
Education
doctor
Not doctor

Percentage of Education Level in


each cluster
1 (%)
2 (%)
100
21,56863
0
78,43137

To ilustrate table above, we can take a look at figure 1.13. it can be seen clearly that each of the
cluster grouped based on the similarity of the cases. So that we can conclude that cluster 1 classified

40

Elisabeth Brielin Sinu/ Data Analysis - 1st Project

based on lecturer who own doctoral degreee, while cluster 2 classified based on lecturers who do not have
doctoral degreee.
iii.

Cluster Analysis of the ITS Lecturers data based on Google Scholar Account.
By executing the cases of functional position based on some variables in GSC account, I
obtain the table below. The table explains that professor position took most part in cluster 2 and 3.
Associate professor in cluster 1, professor and expert assistant sharing the same percentage in the
cluster 4.
Functional
Position
Professor
Associate
Professor
Lector
Expert
Assistant

Percentage of Position Functional in each cluster


1 (%)
2 (%)
3 (%)
4 (%)
10,539
44,444
50

24,824
34,426

33,333
22,222

29,4
8,82

50
0

30,211

11,8

50

The ilustration of the table above can be seen clearly in the following bar chart:

Figure 1.14 bar chart of cluster number of


case based on functional position

Figure 1.15 bar chart of cluster number of


case based on functional position

Based on table and figure 1.14, it can be seen that each cluster cant be considered to represent functional
position because of the cases contained inside cluster different with each other. Figure 1.15 ilustrates that
when the lecturers who do not have position in the university added to the case then the case took part in
cluster 3 and 4.
Table below shows that all the education level contained in one cluster.

Elisabeth Brielin Sinu/ 1314201001

Level of
Education

doctor
bukan
doctor

Percentage of
Education Level
in each cluster
1 (%)
48,94068
51,05932

The bar chart below ilustrates the table clearly:

Figure 1.16 bar chart of cluster number of


case based on education level

According to that, I can not consider that this cluster represent the lecturer based on the similarity of their
education level.
iv.

Cluster Analysis of the Mathematics and Sciences Faculty Lecturers data based on
Google Scholar Account.
By developing the cases of functional position based on some variables in GSC account, I obtain the
table below. The table explains that cluster one represented by professor position. On the other hand, the
other cluster still contain variation of the cases.
Functional
Position
Professor
Associate
Professor
Lector
Expert
Assistant

Percentage of Position Functional in each cluster


1 (%)
2 (%)
3 (%)
4 (%)
100
3,278689 57,14286
22,22222
0
0

24,59016
40,98361

14,28571
28,57143

50
25

32,78689

2,777778

42

Elisabeth Brielin Sinu/ Data Analysis - 1st Project

The ilustration of the table above can be seen clearly in the following bar chart (figure 1.17):

Figure 1.18 bar chart of cluster number of


case based on functional position

Figure 1.17 bar chart of cluster number of


case based on functional position

Figure 1.18 ilustrates that when the lecturers who do not have position in the university added to the case
then the case took part in cluster 5.
In case level of education, I determine the number of cluster as much as 2 cluster. after computing the
data, I obtained cluster 1 represent lecturers that do not carry doctoral degree and cluster 2 represent
lecturers that do not carry doctoral degree . The result can be shown as folows:
Level of
Education
doctoral
Not doctoral

Percentage of Education Level


in each cluster
1 (%)
2 (%)
49,03846
50,96154

The bar chart below ilustrates the table clearly:

Figure 1.19 bar chart of cluster number of


case based on education level

100
0

Elisabeth Brielin Sinu/ 1314201001

v.

Cluster Analysis of the ITS Lecturers data based on Scopus and Google Scholar
Account.
After running the data using SPSS, the output can be summarized in the following table.
Overall, the table shown that functional position of lector represented by cluster 2. In the
meantime, cluster 1, 3 and 4 still contain different cases.
Functional
Position
Professor
Associate
Professor
Lector
Expert
Assistant

Percentage of Position Functional in each cluster


1 (%)
2 (%)
3 (%)
4 (%)
22,3
0
58,333
22,222
28
36,3

0
100

16,667
16,667

38,889
22,222

16,6

8,3333

16,667

The ilustration of the table above can be seen clearly in the following bar chart (figure 1.20):

Figure 1.20 bar chart of cluster number of


case based on functional position

Figure 1.21 bar chart of cluster number of


case based on functional position

Figure 1.21 ilustrates that when the lecturers who do not have position in the university added to the case
then the case included in cluster 2.
In case level of education, I determine the number of cluster as much as 2 cluster. after computing the
data, I obtained cluster 1 represent the lecturers who own doctoral degree with 100% percentage of the
case. Similarly, cluster 2 also represent the lecturers who carry doctoral degreee with about 70%
percentage.

44

Elisabeth Brielin Sinu/ Data Analysis - 1st Project

Level of
Education

Percentage of Education Level


in each cluster
1 (%)
2 (%)

doctor
Not
doctor

100

70,61611

29,38389

The bar chart below ilustrates the table clearly:

Figure 1.22 bar chart of cluster number of


case based on education level

vi.

Cluster Analysis of the Mathematics and Sciences Faculty data based on Scopus and
Google Scholar Account.

Elisabeth Brielin Sinu/ 1314201001

Similar to another analysis before, the output can be summarized in the table below. Overall,
the table explains that cluster 3 represent professor positionn with 100% percentage. In
contrast, the other cluster still lack of similarity inside the cluster.
Functional
Position
Professor
Associate
Professor
Lector
Expert
Assistant

Percentage of Position Functional in each cluster


1 (%)
2 (%)
3 (%)
4 (%)
15,625
57,14286
100
28,57143
34,375
40,625

14,28571
28,57143

0
0

28,57
28,57

9,375

14,28571

The ilustration of the table above can be seen clearly in the following bar chart (figure 1.23):

Figure 1.23 bar chart of cluster number of


case based on functional position

Figure 1.24 bar chart of cluster number of


case based on education level

Table below shows that all the education level contained in one cluster. table describes that cluster 1
represent lecturers with doctoral degree with 100% percentage. Similarly, cluster 2 also contain about
86% of lecturers with doctoral degreee.

Level of Education

doctor
bukan doctor

Percentage of Education Level


in each cluster
1 (%)
2 (%)
100
0

86,95652
13,04348

46

Elisabeth Brielin Sinu/ Data Analysis - 1st Project

The ilustration of table above can be seen in figure 1.24.


Conclusion
To sum up, by implementing factor analysis to the data and following all the requirements, it can
be seen that each variables only load on one component. It means that the variables described as having
simple structure. Independent variables that load on the same component discovered to be associated with
each other. In the other hand, independent variables from different component cant be related with each
other. In example, age and period of work constantly be in the one component. When I develop the
statistics descriptive using matrix plot, age and period of work show positive corellation in every data I
applied. In contrast, there is no pattern of correlation of independent variables from different component. .
In this analysis, none of the component constructed by one variable.
Based on the data, by descriptive analysing, interestingly, age and period of work didnt
constantly significant to the h-index. It means that, having older age and longer time working cant ensure
a lecturer to have higher h-index. According to the descriptive analysis output, the most useful variables
in order to measure the contribution of a lecturer in academic level are number of document, number of
citation, citation by document, english document and another standard determined by Scopus and Google
Scholar.
Discriminant analysis used to classify the h-index into two groups. The step before classify the
response variable, the significant model of discriminant function should be obtained first. The model is
useful when the cross-validated classification accuracy greater than or equal to the proportion by chance
accuracy. Stepwise method is implemented in this analysis since the problem calls for identifying the best
predictors.
Beside discriminant analysis, I also develop another classification method. The main purpose of
Cluster analysis is to group the cases based on the similarity of the object. The result of data analysis
shows that functional position which is consist of Professor, Assistant Professor, Lector, and Expert
Assistant cant be built in cluster. This is because the cases of each cluster tend to spread among cluster.
Consequently, none of the cluster load on dominant cases. Otherwise, cases level of education capable to
be constructed to cluster. Mostly, the lecturer who carry doctoral degree can be separated with the lecturer
who do not own doctoral degree. The complete data analysis summarized in session 3.

Acknowledgement
I realize that this paper needs a lot of improvement due to the fact that this paper didnt have much longer
time consuming to write. Therefore, advice and correction is strongly expected.

Reference
[1] Hair,Black, Babin, Anderson,. Seventh Edition. Multivariate Data Analysis..
[2] Hirsch, Buela-Casal. 2014. The meaning of h-index.
[3] Sharma S. Applied Multivariate technique.USA,1996.

Elisabeth Brielin Sinu/ 1314201001

[4] Wichern J. Applied Multivariate Statistical Analysis.6 th ed.New Jersey, 2007.


[5] Xu, Liu, and Mingers.2014. New Journal Classification Methods based on the global h-index.

You might also like