Professional Documents
Culture Documents
Abstract
In this work I implement Factor Analysis, Discriminant Analysis, Logistic Regression and Cluster Analysis
to Scopus and Google Scholar data. In factor analysis, I use Principal Component Analysis. This method produce
different component in different data. The variables load on one component were most likely to associate with other
variables in the same component. In contrast, the variables from different component seems to be not related. For
example, age and period of work constantly load on one component in every data. In matrix plot showed that age and
period of work have positive correlation. The complete correlation among variables can be seen clearly in matrix plot
in session 3. This matrix plot construct to support the result of factor analysis. In this paper, I also briefly describe the
analysis descriptive in discriminant analysis part. The descriptive analysis provide me the information that age and
length of work didnt constantly significant to the value of h-index. In discriminant analysis part, by using stepwise
method, I provide the most useful predictors for distinguishing among groups based on response which the aim of this
case is to group the lecturers who have h-index less than 2 and the lecturers who have h-index greater than or equal to
2. To classify the response variable, the discriminant function should be determined first. The model is useful when
the cross-validated classification accuracy greater than or equal to the proportion by chance accuracy. The
model produced by using discriminant analysis in each data already is useful model. Another
classification method i described in this paper is cluster analysis. In this analysis I determine the number
of cluster. this method using nonhierarchieal method since the number of cluster is known. I identify
whether the obtained cluster based on the functional position cases which is consist of professor, assistant
professor, lector and expert assistant. Unfortunately, functional position cant be constructed into cluster.
This is because the cases of each cluster tend to spread among cluster. Consequently, none of the cluster
load on dominant cases. Otherwise, cases level of education capable to be constructed to cluster. Mostly,
the lecturer who carry doctoral degree can be separated with the lecturer who do not own doctoral degree.
The complete data analysis summarized in session 3
Keywords: Google Scholar, Scopus, h-index, Citation, PCA, Factor Analysis, Discriminant Analysis, logistic
Analysis, Cluster Analysis
1. Introduction
Scopus is the largest abstract and citation database of peer-reviewed literature: scientific journals, book
and conference proceedings. With over 20,500 titles from more than 5,000 international publishers,
Scopus offers researchers an accurate, easy and comprehensive tool to support their research needs in
scientific, technical, medical, social sciences, and arts and also humanities fields. Beside Scopus, there is a
Google Scholar that provide a simple way to broadly search for scholarly literature. From one place,
people can search across many disciplines and sources: articles, theses, books, abstracts and court
opinions, from academic publishers, professional societies, online repositories, universities and other web
sites. Google Scholar helps to find relevant work across the world of scholarly research. Google Scholar
aims to rank documents the way researchers do, weighing the full text of each document, where it was
published, who it was written by, as well as how often and how recently it has been cited in other
scholarly literature.
The aims of this paper is to implement the Factor Analysis, Discriminant Analysis, Logistic Regression
and Cluster Analysis to the Scopus and Google Scholar data of the lecturers in Sepuluh Nopember
Institute of Technology. There are several data that will be applied to the analysis.Those analysis above
will be executed to the ITS lecturers data that has Scopus account, Google Scholar account, and those who
have both account. Beside analyse ITS lecturers globally, I especially will analyse the lecturers data of
Mathematics and Sciences department according to Scopus Acount, Google Scholar account and those
who have both account. In session I, I will analyse the data using factor analysis so that I can obtain
factors that contain variables with simple structure. There are some requirements that strongly need to
achieve regarding to obtain simple structure variables. These requirements will be described briefly in this
paper. In addition, the Descriptive Analysis will be delivered in order to support the factor analysis result.
Beside providing the implementation of the factor analysis to the data, in session II I will apply the using
of Discriminant Analysis. As we know that the main purpose of Discriminant analysis is to classify.
Therefore, By performing this analysis in this paper, I intend to classify the ITS lecturers based on h-index
value. In the next session, logistic regression will be used to gain the significant function of logistic
regression model. So that, the logistic regression model will be compared to discriminant analysis. In
addition, I will include some independent non metric variables into logistic regression. By doing this, i
expect to extract valuable information using odds ratio. The last analysis that will be provided is cluster
analysis. In session III, I will determine the different number of cluster and identify that those clusters
contains variables that I already choose before. Each analysis requires different kind of assumption.
II. Literature Review
Scopus
Scopus is the largest abstract and citation database of peer-reviewed literature: scientific journals,
books and conference proceedings. Delivering a comprehensive overview of the world's research output in
the fields of science, technology, medicine, social sciences, and arts and humanities, Scopus features smart
tools to track, analyze and visualize research. As research becomes increasingly global, interdisciplinary
and collaborative, you can make sure that critical research from around the world is not missed when you
choose Scopus. Scopus has twice as many titles and over 50% more publishers listed than any other A&I
database, with interdisciplinary content that covers the research spectrum. Timely updates from thousands
of peer-reviewed journals, preliminary findings from millions of conference papers, and the thorough
analysis in an expanding collection of books ensure you have the most up-to-date and highest quality
interdisciplinary content available. Scopus is designed to serve the research information needs of
researchers, educators, administrators, students and librarians across the entire academic community.
Whether searching for specific information or browsing topics, authors, journals or books, Scopus
provides precise entry points to peer-reviewed literature in the fields of science, technology, medicine,
social sciences, and arts and humanities.
Google Scholar
Google Scholar is a freely accessible web search engine that indexes the full text or metadata
of scholarly literature across an array of publishing formats and disciplines. Released in beta in November
2004, the Google Scholar index includes most peer-reviewed online journals of Europe and America's
largest scholarly publishers, plus scholarly books and other non-peer reviewed journals.
[ ][
Y1
a11 a12 a13
Y2
a 21 a22 a23
Y 3 = a 31 a32 a33
Ym
am 1 a m 2 a m3
a1 p
a2 p
a3 p
amp
][ ]
X1
X2
X3
Xm
, m p
(5)
With;
Y1 = the first principal component, the component with the highest variance.
Y2 = the second principal component, the component with the second highest variance.
Ym = the mth principal component , the mth component with the highest variance.
X1 = the first original variable
X2 = the second original variable
X 1 1=l 11 F 1 +l 12 F2 + l 1 m F m + 1
X p p=l p 1 F 1 +l p 2 F 2+ l pm Fm + p
The main purpose of factor analysis is to describe the structure of relationships among many
variables in the form factor or variable latent or variable formations. Factors that form is the amount of
random (random quantities) that previously could not be observed or measured or determined directly.
In addition to the main purpose of factor analysis, there are other objectives are:
1) The first objective to reduce the number of variables origin polynomial into a number of new
variables a smaller number of variables origin, and the new variable is called a factor or latent
variables or constructs or variable formations.
2) The second objective is to identify the relationship between variables making up factor or
dimension by factors formed, by using testing the correlation coefficient between factor with its
constituent components. Analysis of these factors is called confirmatory factor analysis.
3) The third objective is to test the validity and reliability of the instrument with confirmatory factor
analysis.
4) The fourth object one of the purposes of data validation factor analysis is to determine whether
the results of the analysis of these factors can be generalization into the population, so that after a
form factor, the researchers already have a new hypothesis based on the results of the factor
analysis.
Discriminant Analysis
Discriminant analysis is used to analyze relationships between a non-metric dependent variable and
metric or dichotomous independent variables. This analysis attempts to use the independent variables to
distinguish among the groups or categories of the dependent variable. The usefulness of a discriminant
model is based upon its accuracy rate, or ability to predict the known group memberships in the categories
of the dependent variable.
Discriminant analysis works by creating a new variable called the discriminant function score which
is used to predict to which group a case belongs. Discriminant function scores are computed similarly to
factor scores, i.e. using eigenvalues. The computations find the coefficients for the independent variables
that maximize the measure of distance between the groups defined by the dependent variable. The
discriminant function is similar to a regression equation in which the independent variables are multiplied
by coefficients and summed to produce a score.
In many cases, the dependent variable consists of two groups or classifications, for example, male
versus female, and high versus low. In other instances, more than two groups are involved, such as low,
medium, and high classifications. Discriminant Analysis is capable of handling either two-groups or
multiple (three or more) groups. When two classifications are involved, the technique is referred to as
two-group discriminant analysis. When three or more classifications are indentified, the technique is
referred to as multiple discriminant analysis (MDA). Logistic regression is limited in its basic form to two
groups, altough other formulations can handle more groups.
As with all multivariate techniques, discriminant analysis is based on a number of assumptions.
These assumptions relates to both the statistical processes involved in the estimation and classification
procedures and issues affecting the interpretation of the result. These assumptions of discriminant analysis
consist of Normality of Independent variables, linearity of relationship, lack of multicollinearity among
independent variables, and equal dispersion matrices.
Logistic Regression
Logistic regression, along with discriminant analysis , is the apropriate statistical technique when
the dependent variable is categorical (nominal or nonmetric) variable and the independent variables are
metric or nonmetric variables. When compared to discriminant analysis, logistic regression is limited in
its basic form to two groups for the dependent variable, although other formulations can handle more
groups. It does have the advantage, however, of easily incorporating nonmetric variables as independent
variables, much like in multiple regression.
In practical sense, logistic regression may be preferred for two reasons. First, discriminant
analysis relies on strictly meeting the assumptions of multivariate normality and equal-variance matrices
across groups-assumptions that are not met in many situations. Logistic regression does not face this strict
assumptions and is much more robust when there assumptions are not met, making its application
appropriate in many situations. Second, even if the assumptions are met, many researchers prefer logistic
regression because it is similar to multiple regression.
Sample size considerations for logistic regression are primarily focused on the size of each
group, which should have 10 times the number of estimated model coefficients. Samole size requirements
should be met in both the analysis and the hold out samples. Model significance test are made with a chisquare test on the differences in the log likelihood values (-2LL) between two models. Coefficients are
expressed in two forms which is original anf exponentiated to assist in interpretation.
Cluster Analysis
Cluster analysis is a group of multivariate techniques whose primary purpose is to group objects
based on the characteristics the possess. Cluster analysis is comparable to factor analysis in its objective
of assesing structure. Cluster analysis differs from factor analysis, however, in that cluster analysis group
object, whereas factor analysis is primarily concerns with grouping variables. The cluster variate
represents the mathematical representation of the selected set of variables which compares the objects
similarities. The variate in cluster analysis is determined quite differently from other multitvariate
techniques. Cluster analysis is the only multivariate technique that does not estimate the variate
empirically but instead uses the variate as specified by the researcher. The focus of the cluster analysis is
on the comparison of objects based on the variate, not on the estimation of the variate itself.
Component
2
umur
,982
lamakerja
,983
numberofcit
,949
citationbydoc
,968
firstauthorbydate
firstauthorbycitat
highestnumbercit
,921
Extraction Method: Principal Component Analysis.
Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 4 iterations.
,737
,877
According
to
table 1.1 above,
it can be seen
that from 8
variables that I
used in the
beginning,
variable number
of
document
should
be
removed from
the analysis because of the value of communality is less than 0,5. Communality represent the proportion
of the variance in the original variables that is accounted for by the factor solution. The factor solution
should explain at least half of each original variable's variance, so the communality value for each
variable should be 0.50 or higher. After I removed that variable, I repeated the analysis.
After all the requirement is satisfied, the data obtain 3 factors which the first factor contain
variable number of citation, citation by document, and highest number of citation per document. Second
factor contain variable age and working period. The third factor contain variable number of document as
first author that sorted by date and number of document sorted by citation. All the variables already is
simple structure variables. Below I provided matrix plot in order to support the output of factor analysis:
20
30
50
100
10
60
50
age1
40
30
20
workperiod
10
100
50
numbofcitat
100
50
citatbydoc
0
4
2
sortedbydate
10
5
sortedbycitat
0
40
20
highestnumbofcitat
0
40
50
60
50
100
20
40
Figure 1.1 Matrix plot of age, working period, number of citation, citation by document, document as first
author sorted by date, document as first author sorted by citation, highest number of citation per
document.
ii.
highestnumbercit
,898
Extraction Method: Principal Component
Analysis.
Rotation Method: Varimax with Kaiser
Normalization.
a. Rotation converged in 3 iterations.
According to table 1.2 above, it can be seen that from 8 variables that I used in the beginning,
variable number of document should be expelled because of the value of communality is less than 0,5.
After all the requirement is satisfied, the data obtain 2 factors which the first factor contain variable age
and working period while variable number of citation, citation by document, and highest number of
citation per document, number of document as first author that sorted by date and number of document
sorted by citation included in the second factor. All the variables already is simple structure variables.
Below I provided matrix plot in order to support the output of factor analysis:
Matrix Plot of age; period of wo; numberof cit; citat by doc; ...
15
20
25
10
30
50
1,6
2,4
3,2
48
44
age
40
25
20
period of working
15
60
40
numberof citation
20
50
30
citat by doc
10
4
3
first author by date
3,2
2,4
1,6
30
20
10
40
44
48
20
40
60
10
Matrix Plot of age1; period; num of cit; cit by doc; sorted date; ...
15
30
45
80
160
10
60
45
age1
30
45
30
period
15
200
100
num of cit
160
80
cit by doc
10
5
sorted date
10
5
sorted cit
80
40
0
30
45
60
100
200
10
40
80
20
30
iii.
age
,984
periodofworking
,981
allcitation
,902
numberofdoc
,695
englisdoc
,849
firstauthordoc
,732
highestnumofcitperdoc
,718
Extraction Method: Principal Component
Analysis.
Rotation Method: Varimax with Kaiser
Normalization.
a. Rotation converged in 3 iterations.
Based on the table 1.3 above, it can be seen that from 8 variables that I used in the beginning,
variable document in Indonesian should be excluded from the analysis because of the value of measure of
sampling adequacy (MSA) in anti image correlation matrices is less than 0,5. The value of MSA is
described unacceptable if it is below 0,5. After all the requirement is satisfied, the data obtain 2 factors
which the first factor include age and working period variable and the second factor consist of the number
of citation, number of document, document in english, document as first author and highest number of
citation per document variables. All the variables already is simple structure variables. Below I provided
matrix plot in order to support the output of factor analysis:
10
16
24
40
80
1000
1500
2000
48
42
age
36
24
16
workperiod
200
100
numberofallcitat
80
40
numberofdocument
20
10
english doc
2000
1500
1000
50
highest numn of citat per doc
25
0
36
iv.
42
48
100
200
10
20
25
50
Figure 1.4 matrix plot of variables with simple structure according Google
Scholar data of ITS Lecturers
Factor Analysis in
Google Scholar data
of Mathematics and
18
21
12
18
46
44
age
42
21
18
workperiod
15
32
24
numof doc
16
18
12
indo doc
10,0
7,5
5,0
42
44
46
16
24
32
5,0
20
40
40
80
60
40
age1
20
40
20
workperiod1
0
100
numofdoc1
50
0
80
40
ind doc1
0
40
20
0
20
40
60
50
100
20
40
7,5
10,0
12
v.
The table 1.5 above shows us that there are 10 variables left that contained in 2 factors. In the
beginning I have 14 variables. While analysing, I removed document in Indonesian variable followed by
working period variable regarding to the value of Measure of Adequacy (MSA) in those variables was
less than 0,5. After repeating the analysis, age and the number of document as first author sorted by
citation variable has the communality value less than 0,5. I removed the number of document as first
author sorted by citation variable since this variable have smallest value. After repeated the analysis all
the communality value of remain variable already more than 0,5. Once any variables with communalities
less than 0.50 have been removed from the analysis, the pattern of factor loadings should be examined to
identify variables that have complex structure. Complex structure occurs when one variable has high
loadings or correlations (0.40 or greater) on more than one component.
If a variable has complex structure, it should be removed from the analysis. For that reason, age
variable was found to have complex structure. The variable should be removed and the principal
component analysis should be repeated. After all the requirement is satisfied, the data obtain 2 factors
which is Scopus number of citation, Scopus citation by document, highest number of citation in Scopus
and number of citation in GSC variable as the first factor while the second factor contain number of
document in Scopus, number of document in Google Scholar, English document in Google Scholar, and
document as first author in Google Scholar variable. All the variables already is simple structure
variables. Below I provided matrix plot in order to support the output of factor analysis:
80
160
100
200
10
20
50
100
30
20
numberofdoc_ SC
10
160
80
numb of citat_ SC
160
80
citation by doc_ SC
0
4
2
0
50
25
0
200
100
0
80
40
0
20
10
0
30
15
0
100
50
10
20
30
80
160
25
50
40
80
15
30
14
Figure 1.7 matrix plot of variables with simple structure based on ITS Lecturers data in
Scopus and Google Scholar Account.
vi.
Factor Analysis of the Mathematics and Sciences Faculty data based on Scopus and
Google Scholar Account.
1.6 Rotated Component Matrixa
Component
1
2
3
age
,963
periodofworking
,964
numbofdoc_SC
,745
numbogcit_SC
,960
citbydoc_SC
,964
firstauthorbydate_SC
,677
firstauthorbycitat_SC
,766
highestnumbercit_SC
,922
numbofdoc_GSC
,929
engdoc_GSC
,854
firstauthordoc_GSC
,839
Extraction Method: Principal Component Analysis.
Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 5 iterations.
The table 1.6 above shows us that there are 11 variables left that contained in 3 factors. In the
beginning I have 14 variables. While analysing, I removed document in Indonesian, number of citation in
GSC, highest number of citation per document in GSC variable regarding to the value of Measure of
Adequacy (MSA) in those variables was less than 0,5. After all the requirement is satisfied, the data
obtain 3 factors which is the first factor contains scopus variable, the second factor including number of
document in Scopus, number of document in GSC, english document in GSC and document as first
author in GSC while age and working period variable become the third factor. All the variables already is
simple structure variables. Below I provided matrix plot in order to support the output of factor analysis:
20
30
100
200
10
40
80
50
100
20
40
60
45
age1
30
30
20
10
perio
40
20
number of doc_ SC
200
100
num of cit_ SC
160
80
citat by doc_ SC
10
5
sorted by date1
10
5
sorrted by citat1
80
40
highest citat_ SC
0
500
250
0
100
50
0
50
25
0
40
20
400
200
0
30
45
60
20
40
80
160
10
250 500
25
50
200
400
16
Figure 1.8 matrix plot of variables with simple structure according to Lecturers in Mathematics
and Sciences Faculty
20
25
20
40
60
10
20
30
24
36
48
10
15
50
45
age
40
25
20
15
periodofworking
10, 0
7, 5
numboddoc_ SC
5, 0
60
40
20
numofcit_ SC
60
40
citatbydoc_ SC
20
4
3
firstauthorbydate_ SC
4
3
fistrauthrbycitat_ SC
30
20
highestnumofcitat_ SC
10
150
100
50
24
25
20
engdoc_ GSC
15
15
10
fistauthir_ GSC
90
60
hiighestcitat_ GSC
30
40
45
50
5, 0
7, 5
10,0
20
40
60
50
100 150
15
20
25
30
60
90
Figure 1.9 matrix plot of variables with simple structure according to the Department in
Mathematics and Sciences Faculty
Discriminant Analysis
In this session, I will briefly develop the performance of ITS lecturers based on index h value. As we
know, the more the contribution that a lecturer provide in academic level the higher the index h he/she
obtain. Therefore, in this part, the index h value will be classified into two groups which is index h less
than 2 represented by number 1 and index h more than equals to 2 represented by number 2. I assume that
there is no problem with missing data, violation of assumptions, or outliers. I use a level of significance of
0.05 for evaluating the statistical relationship and run the data by SPSS software. In order to determine
the most important Independent variable, the stepwise method is selected for choosing variable.
i.
18
hindex
Total
umur
lamakerja
numberofdoc
numberofcit
citationbydoc
highestnumbercit
umur
lamakerja
numberofdoc
numberofcit
citationbydoc
highestnumbercit
umur
lamakerja
numberofdoc
numberofcit
citationbydoc
highestnumbercit
45,55
19,48
4,44
5,47
5,39
4,48
45,85
20,03
7,70
18,49
17,28
12,39
45,68
19,72
5,86
11,15
10,58
7,93
9,892
10,288
2,448
9,894
9,883
9,331
9,119
9,201
5,150
21,942
21,846
20,009
9,537
9,806
4,180
17,479
17,220
15,421
Valid N (listwise)
Unweighted Weighted
102
102,000
102
102,000
102
102,000
102
102,000
102
102,000
102
102,000
79
79,000
79
79,000
79
79,000
79
79,000
79
79,000
79
79,000
181
181,000
181
181,000
181
181,000
181
181,000
181
181,000
181
181,000
After a short descriptive analysis reported, the next stage is to implement the discriminant
analysis into the data. Discriminant analysis consists of two stages: in the first stage, the discriminant
functions are derived; in the second stage, the discriminant functions are used to classify the cases. There
are several steps in this analysis In order to find the significant model of discriminant analysis. By
running the data using SPSS software I can provide several output below while the complete output will
be attached in the last page of this paper. In this analysis, there are 181 valid cases and 6 independent
variables. The ratio of cases to independent variables is 30.66 to 1, which satisfies the minimum
requirement. Table 1.8 explains that the number of cases in the smallest group in this problem is 79,
which is larger than the number of independent variables (6), satisfying the minimum requirement. In
addition, the number of cases in the smallest group satisfies the preferred minimum of 20 cases.
In this analysis there were 1 groups defined by opinion h-index and 6 independent variables, so
the maximum possible number of discriminant functions was 1. In the table of Wilks' Lambda which
tested functions for statistical significance, the stepwise analysis identified 1 discriminant functions that
were statistically significant. The Wilks' lambda statistic for the test of function 1 (chi-square=76.530) had
a probability of 0.001 which was less than or equal to the level of significance of 0.05.
Test of Function(s)
1
df
4
Sig.
,000
The Wilks' lambda statistic for the test of function 1 (chi-square=76.530) had a probability of 0.00 which
was less than or equal to the level of significance of 0.05. The significance of the maximum possible
number of discriminant functions supports the interpretation of a solution using 1 discriminant function.
In order to specify the role that each independent variable plays in predicting group membership on the
dependent variable, we must link together the relationship between the discriminant function and the
groups defined by the dependent variable, the role of the significant independent variables in the
discriminant functions, and the differences in group means for each of the variables.
Table 1.9 Functions
at Group Centroids
hindex
Function
1
1
-,644
2
,831
Unstandardized
canonical
discriminant
functions evaluated
at group means
Table 1.9 above explains that each function divides the groups into two subgroups by assigning negative
values to one subgroup and positive values to the other subgroup. Function 1 separates h-index value
which is less than 2 (-.644) from h-index value more than equals to 2 (.831).
Table 1.10 below shows us the most important predictor variables that already selected by
stepwise method. This method identified 4 variable that satisfied the level of significance of 0.05. The
most important predictor of groups based on responses to h-index value were number of document,
number of citation, citation by document, and hisghest number of document per citation.
Step
Entered
Sig.
numberofdoc
7,232E-008
numberofcit
8,674E-013
citationbydoc
3,871E-015
highestnumberc
4
2,175 1 and 2
23,800
4
176,000
9,535E-016
it
At each step, the variable that maximizes the Mahalanobis distance between the two closest groups is
entered.
a. Maximum number of steps is 12.
b. Maximum significance of F to enter is .05.
c. Minimum significance of F to remove is .10.
d. F level, tolerance, or VIN insufficient for further computation.
1
2
3
Based on the structure matrix in table 1.11 below, the predictor variable strongly associated with
discriminant function 1 which distinguished between lecturers with h-index values less than 2 and
20
lecturers with h-index more than equals to 2 were number of document (r=0,571) and number of citation
(r=0,542).
(1)
After I obtain the discriminant function, in the next step I use the function to classify the cases.
Table 1.13 below ilustrates that the independent variables could be characterized as useful predictors of
membership in the groups defined by the dependent variable if the cross-validated classification accuracy
rate was significantly higher than the accuracy attainable by chance alone. Operationally, the cross-
validated classification accuracy rate should be 25% or higher than the proportional by chance accuracy
rate. The proportional by chance accuracy rate was computed by squaring and summing the proportion of
cases in each group from the table of prior probabilities for groups (table below) (0.564 + 0.436 =
0.5081).
Table 1.13 Prior Probabilities for Groups
hindex
Prior
Cases Used in Analysis
Unweighted
Weighted
1
,564
102
102,000
2
,436
79
79,000
Total
1,000
181
181,000
22
Table 1.14 above explains that cross-validated accuracy rate computed by SPSS was 78,5%
which was greater than or equal to the proportional by chance accuracy criteria of 71.4% (1.25 x 50.81%
= 63,52%). The criteria for classificationn table and accuracy is satisfied.
In conclusion, We found one statistically significant discriminant function in equation (1),
making it possible to distinguish among the two groups defined by the dependent variable. Moreover, the
cross-validated classification accuracy surpassed the by chance accuracy criteria, supporting the utility of
the model.
ii.
hindex
Total
age
periodofworking
numberofdoc
numberofcit
citationbydoc
highestnumbercit
age
periodofworking
numberofdoc
numberofcit
citationbydoc
highestnumbercit
age
periodofworking
numberofdoc
numberofcit
citationbydoc
50,00
25,41
3,65
5,71
5,65
4,76
42,65
18,00
8,00
18,50
17,45
10,95
46,03
21,41
6,00
12,62
12,03
7,237
6,718
1,801
5,621
5,645
4,070
7,506
7,712
4,193
16,728
16,513
10,758
8,173
8,091
3,944
14,266
13,915
Valid N (listwise)
Unweighted Weighted
17
17,000
17
17,000
17
17,000
17
17,000
17
17,000
17
17,000
20
20,000
20
20,000
20
20,000
20
20,000
20
20,000
20
20,000
37
37,000
37
37,000
37
37,000
37
37,000
37
37,000
highestnumbercit
8,11
8,844
37
37,000
In this first stage I will apply the data of lecturers h-index in Mathematics and Sciences Faculty
to find the discriminant function. By following the steps similar to (i), the analysis obtain that the number
of citation, the number of document and length of work as the most important independent variables
while the remain variable is removed based upon the stepwise method. I also found that the predictor
variable strongly associated with discriminant function 1 which distinguished between lecturers with hindex values less than 2 and lecturers with h-index more than equals to 2 was number of document (r =
546). The coefficient of the Discriminant Function can be seen in the following table:
Based on the table above, I can construct the Discriminant function below:
f ( x) 0, 665length of work 8,10document 0, 664citation
(2)
After obtaining the discriminant function, in the next stage I use the function to classify the
cases. The proportional by chance accuracy rate was computed by squaring and summing the proportion
of cases in each group from the table of prior probabilities for groups (table below) (0,459 + 0,541 =
0,5033). From the SPSS output in table below explains that cross-validated accuracy rate computed by
SPSS was 91,9% which was greater than the proportional by chance accuracy criteria of 62,9% (1,25 x
50,33% = 62,9%). The criteria for classification table and accuracy is satisfied.
Total
17
20
25
100,0
100,0
100,0
17
20
100,0
100,0
24
b. Cross validation is done only for those cases in the analysis. In cross validation, each case is
classified by the functions derived from all cases other than that case.
c. 91,9% of cross-validated grouped cases correctly classified.
To sum up, We already found one statistically significant discriminant function in equation (2),
making it possible to distinguish among the two groups defined by the dependent variable. Moreover, the
cross-validated classification accuracy surpassed the by chance accuracy criteria, supporting the utility of
the model.
iii.
Discriminant Analysis of the ITS Lecturers data based on Google Scholar Account.
Based on the table 1.17 below, overall, it can be seen that age and length of work are not
significantly different between groups. On the other hand, the remain variables have quite different mean
between group. On average, the Lecturers of ITS who have higher number of citation, number of
document, english document, indonesian document and highest number of citation per document in
Google Scholar Account were more likely to have higher h-index value.
1.17 Group Statistics
all_hindex
Mean
Std. Deviation
Valid N (listwise)
Unweighted
Weighted
age
43,91
10,862
111
111,000
periodofworking
17,12
10,817
111
111,000
allcitation
6,85
23,183
111
111,000
1
numberofdoc
12,33
9,932
111
111,000
englisdoc
4,55
4,094
111
111,000
indodoc
7,78
8,630
111
111,000
highestnumofcitperdoc
5,72
23,218
111
111,000
age
43,82
9,652
96
96,000
periodofworking
17,75
10,198
96
96,000
allcitation
14,56
11,767
96
96,000
2
numberofdoc
22,56
23,841
96
96,000
englisdoc
9,46
7,891
96
96,000
indodoc
13,09
21,470
96
96,000
highestnumofcitperdoc
8,21
9,839
96
96,000
age
43,87
10,294
207
207,000
periodofworking
17,41
10,514
207
207,000
allcitation
10,43
19,124
207
207,000
Total
numberofdoc
17,08
18,465
207
207,000
englisdoc
6,83
6,610
207
207,000
indodoc
10,25
16,106
207
207,000
highestnumofcitperdoc
6,87
18,277
207
207,000
By applying the data of ITS lecturers based on Google Scholar account and using the similar step
of analysis before, I obtain the most important independent variables defined by dependent variable that
met the statistical test for inclusion. Those variables are the number of citation and highest number of
citation per document. Between these independent variables, the number of citation variable was strongly
associated with discriminant function 1 which distinguished between lecturers with h-index values less
than 2 and lecturers with h-index more than equals to 2. The coefficient of discriminant function shows in
the table below:
Table 1.18 Standardized Canonical
Discriminant Function Coefficients
Function
1
allcitation
6,735
highestnumofcitperdoc
-6,588
Based on the table above, the discriminant function can be written in the following:
(3)
according to obtained discriminant function, the classified cases can be seen in the following table:
Table 1.19 Classification Resultsa,c
Predicted Group Membership
Total
all_hindex
1
2
1
103
8
111
Count 2
23
73
96
Ungrouped cases
93
171
264
Original
1
92,8
7,2
100,0
%
2
24,0
76,0
100,0
Ungrouped cases
35,2
64,8
100,0
1
103
8
111
Count
2
24
72
96
Cross-validatedb
1
92,8
7,2
100,0
%
2
25,0
75,0
100,0
a. 85,0% of original grouped cases correctly classified.
b. Cross validation is done only for those cases in the analysis. In cross validation, each case is
classified by the functions derived from all cases other than that case.
26
iv.
Discriminant Analysis in Google Scholar data of Mathematics and Sciences Faculty Lecturers
The table below shows us that in google scholar account, lecturers with younger age and work
length fewer than another lecturers are included in group 2, although not significantly. On the other hand,
on average, the Lecturers of Mathematics and Sciences Faculty who have lower number of citation,
number of document, english document, indonesian document and highest number of citation per
document in Google Scholar Account were more likely to be included in group 1.
1.21 Group Statistics
hindex
Mean
Std. Deviation
Valid N (listwise)
Unweighted
Weighted
age
45,70
10,254
27
27,000
workingperiod
20,15
10,220
27
27,000
allcitation
4,11
2,887
27
27,000
numbofdoc
12,52
12,819
27
27,000
1
englishdoc
3,26
2,969
27
27,000
inddoc
9,26
11,749
27
27,000
highestnumbofcitatper
2,89
2,391
27
27,000
doc
2
age
44,74
9,473
23
23,000
workingperiod
18,35
9,810
23
23,000
allcitation
14,65
12,160
23
23,000
Total
numbofdoc
englishdoc
inddoc
highestnumbofcitatper
doc
age
workingperiod
allcitation
numbofdoc
englishdoc
inddoc
highestnumbofcitatper
doc
20,26
10,09
10,13
19,017
12,124
10,687
23
23
23
23,000
23,000
23,000
8,26
10,136
23
23,000
45,26
19,32
8,96
16,08
6,40
9,66
9,814
9,972
9,949
16,272
9,082
11,168
50
50
50
50
50
50
50,000
50,000
50,000
50,000
50,000
50,000
5,36
7,515
50
50,000
In this part, the data of lecturers in Mathematical and Sciences Faculty based on Google Scholar
Account will be applied to the analysis. The important independent variables obtained by implementing
stepwise method are the number of citation, number of document and highest number of citation per
document. Among those independent variables, the number of citation was strongly associated with
discriminant function 1 which distinguished between lecturers with h-index values less than 2 and
lecturers with h-index more than equals to 2. After the significant variables obtained, I construct the
function discriminant based on the coefficient of discriminant function below:
(4)
Based on the discriminant function, classified cases can be showed in the table below. The
proportional by chance accuracy rate was computed by squaring and summing the proportion of cases in
each group from the table of prior probabilities for groups (0,540 + 0,460 = 0,5033). The value of
proportional by chance accuracy compared to the SPSS output in table below which explains that crossvalidated accuracy rate computed by SPSS was 91,9% which was greater than the proportional by chance
accuracy criteria of 62,9% (1,25 x 50,33% = 62,9%). So that the criteria for classification table and
accuracy is satisfied.
Table 1.23 Classification Resultsa,c
Predicted Group Membership
Total
hindex
1
2
Original
Count 1
25
2
27
2
2
21
23
28
Ungrouped cases
19
36
55
1
92,6
7,4
100,0
%
2
8,7
91,3
100,0
Ungrouped cases
34,5
65,5
100,0
1
25
2
27
Count
2
5
18
23
Cross-validatedb
1
92,6
7,4
100,0
%
2
21,7
78,3
100,0
a. 92,0% of original grouped cases correctly classified.
b. Cross validation is done only for those cases in the analysis. In cross validation, each case is
classified by the functions derived from all cases other than that case.
c. 86,0% of cross-validated grouped cases correctly classified.
Finally, We already found one statistically significant discriminant function in equation (4),
making it possible to distinguish among the two groups defined by the dependent variable. Moreover, the
cross-validated classification accuracy surpassed the by chance accuracy criteria, supporting the utility of
the model.
v.
Discriminant Analysis of the ITS Lecturers data based on Scopus and Google Scholar Account.
The table below explains that, on average, lecturers of ITS with younger age were more likely
have lower h-index Scopus, although not singnificantly. However, the other variables shows us significant
difference between groups such as length of work, number of citation and document in scopus, google
scholar number of citation, english document, and number of document. Being lecturers with high value
of h-index were more likely to have higher number of significant variables above.
Table 1.24 Group Statistics
hindex_SC
Mean
Std. Deviation
Valid N (listwise)
Unweighted Weighted
numbogcit_SC
5,49
6,292
61
61,000
numbofcitat_GSC
34,52
87,459
61
61,000
engdoc_GSC
13,07
10,136
61
61,000
1
age
44,61
9,126
61
61,000
periodofworking
18,87
9,772
61
61,000
numbofdoc_SC
5,03
3,376
61
61,000
numbofdoc_GSC
27,21
25,496
61
61,000
numbogcit_SC
19,22
23,853
55
55,000
numbofcitat_GSC
74,76
137,264
55
55,000
engdoc_GSC
21,07
16,174
55
55,000
2
age
45,95
8,603
55
55,000
periodofworking
20,05
9,316
55
55,000
numbofdoc_SC
8,56
5,779
55
55,000
numbofdoc_GSC
44,05
48,525
55
55,000
Total
numbogcit_SC
12,00
18,309
116
116,000
numbofcitat_GSC
53,60
115,088
116
116,000
engdoc_GSC
16,86
13,876
116
116,000
age
45,24
8,869
116
116,000
periodofworking
19,43
9,535
116
116,000
numbofdoc_SC
6,71
4,976
116
116,000
numbofdoc_GSC
35,20
38,938
116
116,000
In this session, the discriminant analysis will be applied to the data of ITS Lecturers based on
Scopus and Google Scholar Account. SPSS output produce the useful variables obtained by implementing
stepwise method are number of citation in Scopus and number if document in scopus. I limit the
interpretation of independent variables to those listed variables above. Between both variables, the
predictor variable that strongly associated with discriminant function 1 which distinguished between
lecturers with h-index values less than 2 and lecturers with h-index more than equals to 2 was number
citation in Scopus (r = 0,697). After the significant variables obtained, I construct the function
discriminant based on the coefficient of discriminant.
30
vi. Discriminant Analysis of the Mathematics and Sciences Faculty data based on Scopus and Google
Scholar Account.
The table below concludes that scopus number of document and citation, Google Scholar
number of document and citation give a major impact to the value of h-index. Lecturers of Mathematical
and Sciences Faculty who has h-index 2 were more likely to have higher number of scopus document
and citation and also Google Scholar document and citation.
hindex_SC
Total
numbofdoc_SC
numbogcit_SC
numbofcitat_GSC
numbofdoc_GSC
numbofdoc_SC
numbogcit_SC
numbofcitat_GSC
numbofdoc_GSC
numbofdoc_SC
numbogcit_SC
numbofcitat_GSC
numbofdoc_GSC
2,234
7,052
208,137
22,691
4,381
18,301
39,616
31,097
4,287
16,068
130,483
27,864
Valid N (listwise)
Unweighted Weighted
10
10,000
10
10,000
10
10,000
10
10,000
16
16,000
16
16,000
16
16,000
16
16,000
26
26,000
26
26,000
26
26,000
26
26,000
In this first stage I will apply the data of lecturers in Mathematics and Sciences Faculty based on
Scopus and Google Scholar account to find the discriminant function. By following the steps similar to
another data, the analysis obtain that the number of citation in Scopus and number of document in Scopus
are useful independent variables based upon stepwise method. I limit the interpretation of independent
variables to both listed variables above. I also found that the predictor variable strongly associated with
discriminant function 1 which distinguished between lecturers with h-index values less than 2 and
lecturers with h-index more than equals to 2 was number of document in Scopus (r = 744). After the
significant variables obtain, I will construct the discriminant function defined bt the coefficient of the
Discriminant Function which can be seen in the following table:
Table 1.28 Standardized
Canonical Discriminant
Function Coefficients
numbofdoc_SC
numbogcit_SC
Function
1
,877
,682
In the second stage, I will classify the cases. Because independent variables could be
characterized as useful predictors of membership in the groups defined by the dependent variable if the
cross-validated classification accuracy rate was significantly higher than the accuracy attainable by
chance alone which the cross-validated classification accuracy rate should be 25% or higher than the
proportional by chance accuracy rate. Therefore, The proportional by chance accuracy rate was computed
by squaring and summing the proportion of cases in each group from the table of prior probabilities for
groups (table below) (0,385 + 0,615 = 0,526). From the SPSS output in table below explains that crossvalidated accuracy rate computed by SPSS was 73,1% which was greater than the proportional by chance
accuracy criteria of 65,8% (1,25 x 52,6% = 65,8%). The criteria for classification table and accuracy is
satisfied.
Logistic Regression
In this part, I will use the same data to find the function of logistic regression model. Before
starting the analysis, Assume that there is no problem with missing data, outliers, or influential cases, and
32
that the validation analysis will confirm the generalizability of the results. Use a level of significance of
0.05 for evaluating the statistical relationship.
i.
Logistic Regression of ITS Lecturers data based on Scopus
According to the output variable in equation using SPSS, I construct the logistic regression
model below. By using stepwise method I can obtain the most useful of important predictor. After
running based upon stepwise method, the important independent variables predictor for
distinguishing between groups based on responses to the number of lecturers that have more hindex than the other are number of document, number of citation, citation by document and highest
number per document.
To characterize that model as useful, I compare the overall percentage accuracy rate produced by SPSS at
the last step in which variables are entered to 25% more than the proportional by chance accuracy.
The proportional by chance accuracy criteria is 62,5% (1.25 x 50% = 62,5%). The accuracy rate
computed by SPSS was 89,9% which was greater than the proportional by chance accuracy criteria of is
62,5% (1.25 x 50% = 62,5%). The criteria for classification accuracy is satisfied.
In Conclusion, the model is significant based on the proportional by chance criteria and the
accuracy rate computed by SPSS.. Beside, We found a statistically significant overall relationship
between the combination of independent variables and the dependent variable.There was no evidence of
numerical problems in the solution.
ii.
(8)
None of the independent variables has standard error greater than 2 so that there is no violation
regarding to multicollinearity. To characterize that model as useful, similar to (i) consider the table below:
Table 1.32 Classification Tablea
Observed
Predicted
hindexnew
Percentage
Correct
1,00
2,00
1,00
23
2
92,0
hindexnew
Step 3
2,00
2
35
94,6
Overall Percentage
93,5
a. The cut value is ,500
SPSS reports the overall accuracy rate in the footnotes to the table "Classification Table." The
overall accuracy rate computed by SPSS was 93,5%. The proportional by chance accuracy rate was
computed by calculating the proportion of cases for each group based on the number of cases in each
group in the classification table at Step 0, and then squaring and summing the proportion of cases in each
group (0,403 + 0,596 = 0,518). The proportional by chance accuracy criteria is 64,8% (1.25 x 51,8% =
64,8%). The accuracy rate computed by SPSS was 93,3% which was greater than the proportional by
chance accuracy criteria of is 64,8% (1.25 x 51,8% = 64,8%).The criteria for classification accuracy is
satisfied.
Finally, We found a statistically significant overall relationship between the predictor
independent variables and the dependent variable. Moreover, the classification accuracy surpassed the
proportional by chance accuracy criteria, supporting the utility of the model.
34
iii.
Logistic Regresion of the ITS Lecturers data based on Google Scholar Account
Based on the output SPSS, the most important independent variables for distinguishing between
groups based on responses to the number of lecturers that have more h-index than the other are the
number of citation, indonesian document, and highest number of citation per document. teoritically, the
more a lecturer contribute in academic level then the higher index value. By obtaining the important
independent variable, I construct the logistic regression model below.
(8)
SPSS reports the overall accuracy rate in the footnotes to the table "Classification Table." The overall
accuracy rate computed by SPSS was 94,3%.
The proportional by chance accuracy rate was computed by calculating the proportion of cases
for each group based on the number of cases in each group in the classification table at Step 0, and then
squaring and summing the proportion of cases in each group (0,430 + 0,569 = 0,509).The proportional
by chance accuracy criteria is 63,6% (1.25 x 50,9% = 63,6%). The accuracy rate computed by SPSS was
94,3% which was greater than the proportional by chance. The criteria for classification accuracy is
satisfied.
In Conclusion, the model is significant based on the proportional by chance criteria and the
accuracy rate computed by SPSS.. Beside, We found a statistically significant overall relationship
between the combination of independent variables and the dependent variable.
iv.
Logistic regression of the Mathematics and Sciences Faculty Lecturers data based on Google
Scholar Account.
Based on the SPSS output, the most useful independent variable based upon stepwise method is
document in english variable. Therefore, the logistic model can be written as:
e 2,127 0,403 x1
1 e
e2,127 0,403 x1
(9)
SPSS reports the overall accuracy rate in the footnotes to the table "Classification Table." The
overall accuracy rate computed by SPSS was 81,9%. The proportional by chance accuracy criteria is
63,4% (1.25 x 50,7% = 63,4%). The accuracy rate computed by SPSS was 81,9% which was greater than
the proportional by chance. The criteria for classification accuracy is satisfied.
In Conclusion, the model is significant based on the proportional by chance criteria and the
accuracy rate computed by SPSS.
v.
Logistic regression of the ITS Lecturers data based on Scopus and Google Scholar Account.
Based on the SPSS output, the most useful independent variable based upon stepwise method are
number of document in scopus, number of citation in scopus and citation by document in scopus.
Therefore, the logistic model can be written as:
(10)
SPSS reports the overall accuracy rate in the footnotes to the table "Classification Table." The
overall accuracy rate computed by SPSS was 87,9%. The proportional by chance accuracy criteria is
63,0% (1.25 x 50,4% = 63,0%). The accuracy rate computed by SPSS was 87,9% which was greater than
the proportional by chance. The criteria for classification accuracy is satisfied.
In Conclusion, the model is significant based on the proportional by chance criteria and the
accuracy rate computed by SPSS.
vi.
Logistic regression of the Mathematics and Sciences Faculty data based on Scopus and Google
Scholar Account.
36
Based on the SPSS output, the most useful independent variable based upon stepwise method is
document in english variable. Therefore, the logistic model can be written as:
(11)
SPSS reports the overall accuracy rate in the footnotes to the table "Classification Table." The
overall accuracy rate computed by SPSS was 100 %. The proportional by chance accuracy criteria is
68,8% (1.25 x 55,09% = 68,8%). The accuracy rate computed by SPSS was 100 % which was greater
than the proportional by chance. The criteria for classification accuracy is satisfied.
In Conclusion, the model is significant based on the proportional by chance criteria and the
accuracy rate computed by SPSS.
Cluster analysis
In this session, I will implement the cluster analysis to each of the data. The objective of cluster
analysis is to group objects based on the characteristics they possess. By considering some variables, I
intend to classify the lecturer based on the similarity of their functional position and education level. The
variable that I used to obtain the result are consist of age, length of work, number of citation, number of
document, citation by document, highest number of citation per document, english document, and
indonesian document. I assume that there is no problem with missing data, violation of assumptions, or
outliers.
Functional
Percentage of Position Functional in each cluster
1 (%)
2 (%)
3 (%)
4 (%)
Position
Professor
25
19,85
0
42,85
Associate
33,33
25,53
0
14,285
Professor
Lector
29,16
35,46
100
21,42
Expert
12,5
0,001
0
21,42
Assistant
Nonhierarchial clustering methods are preferred because the number of cluster is known. Beside, this
methods generally are less susceptible to outliers.
i.
Cluster analysis of ITS Lecturers data based on Scopus
By determining that number of cluster is 4, I will grouped the data based on functional position.
Beside that, each cluster will be identified regarding to the percentage of functional position. After
running the data using SPSS, the output can be summarized in the following table:
Table 1.32
According to the table above, it can be seen that the percentage of functional position tend to
spread in each cluster. So that, there is no possibility to decide the title of each cluster. In detail,
eventough in cluster 1 position as Associate professor is dominant, but not significantly different to
another functional position. The same thing happen to the other cluster. in addition, eventough cluster 3
only contain one position,but if we take a look in the figure below, only one cases included in that cluster.
When I separate the case of expert assistant with the lecturer who do not have functional position and
reanalyse with 5 determined as number of cluster then the result can be seen in figure 1.10. in detail, the
percentage is presented in table below:
Functional
Percentage of Position Functional in each cluster
1
2 (%)
3 (%)
4 (%)
5
Position
(%)
(%)
Professor
25
100 30,43478
19,40299
0
Associate Professor
50
0 30,43478
24,62687
0
Lector
25
0 28,26087
35,8209
50
Expert Assistant
0
0 10,86957
19,40299
50
Do not have functional position
0
0
0
38
In case level of education, I determine the number of cluster as much as 2 cluster. after computing the
Level of
Percentage of
data, cluster 1 represent
lecturers that carry
Education
Education Level in each
doctoral degree and
otherwise. The result can
cluster
be shown as follow
1 (%)
2 (%)
Doctor
90
1,030928
Not Doctor
10
98,96907
The lecturers who carry doctoral degree were most likely to be dominant in the first cluster while
the second cluster mostly containing the other lecturers as a dominant. The diffference between each cases
shows us significant percentage. In detail, the percentage of lecturers who have doctoral degree was 90%
which far higher than the percentage of lecturers who doesnt have doctoral degree (10%). Otherwise,
98% cases of lecturers that dont have doctoral degree take place in second cluster. The table above can
be ilustrated in the following bar chart:
ii.
Cluster analysis of Mathematics and Sciences Faculty Lecturers data based on Scopus
Similar to (i), I will use the different data to run the analysis using SPSS. By determining 4 as
number of cluster, I will grouped the data based on functional position. Beside that, each cluster will be
identified regarding to the percentage of functional position. The output can be summarized in the
following table:
Functional
Position
Professor
Associate Professor
The table below explains us that the lecturers who carry doctoral degree were took all part
(100%) in cluster 1. In contrast, cluster 2 contain most of the lecturer who do not own doctoral degree
(78%).
Level of
Education
doctor
Not doctor
To ilustrate table above, we can take a look at figure 1.13. it can be seen clearly that each of the
cluster grouped based on the similarity of the cases. So that we can conclude that cluster 1 classified
40
based on lecturer who own doctoral degreee, while cluster 2 classified based on lecturers who do not have
doctoral degreee.
iii.
Cluster Analysis of the ITS Lecturers data based on Google Scholar Account.
By executing the cases of functional position based on some variables in GSC account, I
obtain the table below. The table explains that professor position took most part in cluster 2 and 3.
Associate professor in cluster 1, professor and expert assistant sharing the same percentage in the
cluster 4.
Functional
Position
Professor
Associate
Professor
Lector
Expert
Assistant
24,824
34,426
33,333
22,222
29,4
8,82
50
0
30,211
11,8
50
The ilustration of the table above can be seen clearly in the following bar chart:
Based on table and figure 1.14, it can be seen that each cluster cant be considered to represent functional
position because of the cases contained inside cluster different with each other. Figure 1.15 ilustrates that
when the lecturers who do not have position in the university added to the case then the case took part in
cluster 3 and 4.
Table below shows that all the education level contained in one cluster.
Level of
Education
doctor
bukan
doctor
Percentage of
Education Level
in each cluster
1 (%)
48,94068
51,05932
According to that, I can not consider that this cluster represent the lecturer based on the similarity of their
education level.
iv.
Cluster Analysis of the Mathematics and Sciences Faculty Lecturers data based on
Google Scholar Account.
By developing the cases of functional position based on some variables in GSC account, I obtain the
table below. The table explains that cluster one represented by professor position. On the other hand, the
other cluster still contain variation of the cases.
Functional
Position
Professor
Associate
Professor
Lector
Expert
Assistant
24,59016
40,98361
14,28571
28,57143
50
25
32,78689
2,777778
42
The ilustration of the table above can be seen clearly in the following bar chart (figure 1.17):
Figure 1.18 ilustrates that when the lecturers who do not have position in the university added to the case
then the case took part in cluster 5.
In case level of education, I determine the number of cluster as much as 2 cluster. after computing the
data, I obtained cluster 1 represent lecturers that do not carry doctoral degree and cluster 2 represent
lecturers that do not carry doctoral degree . The result can be shown as folows:
Level of
Education
doctoral
Not doctoral
100
0
v.
Cluster Analysis of the ITS Lecturers data based on Scopus and Google Scholar
Account.
After running the data using SPSS, the output can be summarized in the following table.
Overall, the table shown that functional position of lector represented by cluster 2. In the
meantime, cluster 1, 3 and 4 still contain different cases.
Functional
Position
Professor
Associate
Professor
Lector
Expert
Assistant
0
100
16,667
16,667
38,889
22,222
16,6
8,3333
16,667
The ilustration of the table above can be seen clearly in the following bar chart (figure 1.20):
Figure 1.21 ilustrates that when the lecturers who do not have position in the university added to the case
then the case included in cluster 2.
In case level of education, I determine the number of cluster as much as 2 cluster. after computing the
data, I obtained cluster 1 represent the lecturers who own doctoral degree with 100% percentage of the
case. Similarly, cluster 2 also represent the lecturers who carry doctoral degreee with about 70%
percentage.
44
Level of
Education
doctor
Not
doctor
100
70,61611
29,38389
vi.
Cluster Analysis of the Mathematics and Sciences Faculty data based on Scopus and
Google Scholar Account.
Similar to another analysis before, the output can be summarized in the table below. Overall,
the table explains that cluster 3 represent professor positionn with 100% percentage. In
contrast, the other cluster still lack of similarity inside the cluster.
Functional
Position
Professor
Associate
Professor
Lector
Expert
Assistant
14,28571
28,57143
0
0
28,57
28,57
9,375
14,28571
The ilustration of the table above can be seen clearly in the following bar chart (figure 1.23):
Table below shows that all the education level contained in one cluster. table describes that cluster 1
represent lecturers with doctoral degree with 100% percentage. Similarly, cluster 2 also contain about
86% of lecturers with doctoral degreee.
Level of Education
doctor
bukan doctor
86,95652
13,04348
46
Acknowledgement
I realize that this paper needs a lot of improvement due to the fact that this paper didnt have much longer
time consuming to write. Therefore, advice and correction is strongly expected.
Reference
[1] Hair,Black, Babin, Anderson,. Seventh Edition. Multivariate Data Analysis..
[2] Hirsch, Buela-Casal. 2014. The meaning of h-index.
[3] Sharma S. Applied Multivariate technique.USA,1996.