Professional Documents
Culture Documents
RESULT &
BACKGROUND Cluster Dendrogram
40
Height
Bank consisting of sub-national government consisting of output program 10
10
DATA 0
26
17
23
27
15
4
1
7
3
22
16
20
6
13
25
5
28
12
8
19
9
10
14
2
24
18
21
11
0
expenditure from the central and sub-national (provinces, districts) Figure 3.1 Fig 3.2
governments. The data comes from publicly available data sources, `Percentage of explained variance Dendrogram based on PCA
Fig 3.3
managed by the Government of Indonesia (GoI) and, unless indicated Cluster Plot
otherwise, is audited realized expenditure data.
The dataset contain 511+34 cases and 77 variables and after data From figure 1 we can conclude that by using 4 Principle
manipulation (eliminating Missing Value, standardization, and removing Components, at least 75% variance on the dataset can be
duplicates), the dataset only consisting of 28 cases and 65 variables which explained by the component that we created.
latter used in the analysis.
From figure 2.0 we know which district belong to which
cluster. the 4th cluster only have 1 member, the 3rd cluster have 6
members, the 2nd cluster have 2 members, and the rest join the
METHOD first cluster.
Clustering is one of the important data mining methods for
discovering knowledge in multivariate data sets. The goal is to identify Based on the result, district in cluster 1 have coordinates
groups (i.e. clusters) of similar objects within a data set of interest. Briefly, the on axes 1 and 2. Districts in cluster 2 have coordinates on the
two most common clustering strategies are: second axis, district who belong to the third cluster have
? Hierarchical clustering. coordinates on the first axis. And district who belong to the forth
Used for identifying groups of similar observations in a data set. cluster have coordinates on axes 1 and 4. Here, a dimension is
? Partitioning clustering such as k-means algorithm. kept only when the v-test is higher than 2.
Used for splitting a data set into several groups.