You are on page 1of 1

I have a dataset with 23 columns (dimensions).

I applied Kmeans package in R on that


dataset. I would like to know in which dimension the kmeans performed. Explain?
K-Means Package in R
K-means Package is among the most widely used tools for data science used for grouping
datasets into clusters that are revealed just by calculating their likeness to others.
In this dataset, there is an assortment of different values. Given the set of observations (x1, x2,
up to xn). Whereby each observation is a dimensional real vector, k-means clustering aims to
partition the n observations into where k<=n, therefore, S = {S1, S2 up to Sk} so as to
minimize the WCSS (within-cluster sum of squares). While the k-means algorithm is
unsupervised, it does need to know how many clusters the user expects to find in the dataset.
Usually, k is chosen to be close to the sum of all labels, but it is very problematic to pick the
best number without trying out a few different values. The K-means package in R works by
separating the training data into k clusters. It calculates the centre-most point (mean) of each
group, giving k means. Hence, the new data set is arranged based on their distance to all the
cluster centres, that is; the nearest group is considered the most comparable and thus the best
fit.
There is the-the possibility of one or more columns the 23 column dataset having values that
are on different scales; this will this make a difference to the k-means algorithm. If so, we
should first standardize the columns of the dataset so that each column represents equal
weight.
The next step involves assigning labels to the derived clusters, which is a two-stage process.
First, every cluster should have a label. For every group, Ill look down its column and pick
the name that has the highest percentage of datasets assigned to it. Note that this can result in
two or more clusters having the same label.

You might also like