Welcome to Scribd!

K-medoids Clustering Algorithm: Definition, Examples, Strengths and Weaknesses

Uploaded by

0% found this document useful (0 votes)

167 views18 pages

K-medoids Breaks the dataset into k clusters based on the principle of minimizing the sum of the dissimilarities between each object and its corresponding reference point medoid - point in the dataset whose average dissimilarity to all the objects in the cluster is minimal i.e. It is a most centrally located point in the cluster.

Original Description:

Original Title

K-metoids

Copyright

Available Formats

PPTX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Attribution Non-Commercial (BY-NC)

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

167 views18 pages

K-medoids Clustering Algorithm: Definition, Examples, Strengths and Weaknesses

Uploaded by

Catalina Mocanu

Copyright:

Attribution Non-Commercial (BY-NC)

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 18

Search inside document

` ` ` ` ` ` ` ` `

What is K-medoids? K-means vs. K-medoids Algorithm Example Weaknesses Complexity PAM CLARA Conclusions

K partitonal clustering
algorithm. Breaks the dataset into K clusters based on the principle of minimizing the sum of the dissimilarities between each object and its corresponding reference point

Medoid - point in the

dataset whose average dissimilarity to all the objects in the cluster is minimal i.e. it is a most centrally located point in the cluster.

Kaufman and Rousseeuw, 1987

` `

centroid (mean) medoid Improvement brought by K-medoids:

K-means algorithm is sensitive to outliers since an object with an extremely large value may substantially distort the distribution of data. K-medoids is more robust to outliners than K-means therefore results in more quality clustering

When to use K-medoids over K-means? -scenarios where an imaginary point such as a mean (centroid) cannot be defined:
3-D trajectories the gene expression context

1. Randomly select K points as the initial medoids. 2. Assign all points to the closest medoid. 3. See if any other point is a better medoid (i.e, has the lowest average distance to all other points)
Update step: For each medoid m and each data point o associated to m swap m and o and compute the total cost of the configuration (that is, the average dissimilarity of o to all the data points associated to m). Select the medoid o with the lowest cost of the configuration.

4. Repeat steps 2 and 3 until the medoids dont change.

` `

K must be known in advance Since medoids are chosen randomly, results may vary from run to run due to the first choice of k medoids. The algorithm is incompatible with non-convex data set. Finding a better medoid involves comparing all pairs of medoid and non-medoid points and is relatively inefficient

O( K (N-K) K*(N-K) ) => O(N2)

The third loop iterates through each non-medoid object in order to compute the distance of swapping the medoid and nonmedoid

The second loop iterates through the N-K objects in the non-medoids list

The first loop iterates through the K medoids.

` `

PAM (Partitioning Around Medoids, 1987) Differs from standard K-medoids in the Update step:
x Randomly select a non-medoid object x Compute the total cost of the configuration obtained by swapping omedoid with orandom x If oldCost-newCost < 0 then swap omedoid with orandom

`
`

Complexity: O( K *(N-K)2 )
Weakness: PAM works effectively for small data sets, but does not scale well for large data sets

` `

CLARA (Clustering Large Applications, 1990) It draws multiple samples of the data set, applies PAM on each sample, and gives the best clustering as the output Strength: deals with larger data sets than PAM Weakness:
Efficiency depends on the sample size A good clustering based on samples will not necessarily represent a good clustering of the whole data set if the sample is biased

` `

Each cluster center is represented by a centrally located point (the medoid) rather than a prototype point. Pros
more robust to noise and outliers as compared to kmeans independent of data-order compared to standard Kmeans clustering. provides better class separation than k-means

Cons
Can be computationally costlier compared to K-means O(N2) vs. O(N) per iteration

ESSAY
Document4 pages
ESSAY
EY SI
No ratings yet
Probability, AUC, and Excel Linest Function
Document2 pages
Probability, AUC, and Excel Linest Function
Wathek Al Zuaiby
0% (1)
StockWatson 3e EmpiricalExerciseSolutions PDF
Document65 pages
StockWatson 3e EmpiricalExerciseSolutions PDF
Christine Yan
0% (1)
STA108 Project 1
Document27 pages
STA108 Project 1
moon star angel
100% (3)
Data Mining-Partitioning Methods
Document7 pages
Data Mining-Partitioning Methods
Raj Endran
100% (1)
Unit Ii DM
Document82 pages
Unit Ii DM
Suganthi D PSGRKCW
No ratings yet
Asynchrous Task K6-7 Chaitra
Document2 pages
Asynchrous Task K6-7 Chaitra
Linda Amunyela
No ratings yet
Cluster Analysis Clustering
Document6 pages
Cluster Analysis Clustering
17CSE97- VIKASHINI TP
No ratings yet
K-medoids and PAM Clustering of Rules from Insurance Data
Document6 pages
K-medoids and PAM Clustering of Rules from Insurance Data
moldova89
No ratings yet
Assignment 5
Document3 pages
Assignment 5
Pujan Patel
No ratings yet
Chapter 3: Guide to Cluster Analysis Techniques
Document33 pages
Chapter 3: Guide to Cluster Analysis Techniques
prabhudeen
100% (1)
Ijret 110306027
Document4 pages
Ijret 110306027
International Journal of Research in Engineering and Technology
No ratings yet
Comparing Machine Learning Clustering Algorithms on Sample Dataset
Document10 pages
Comparing Machine Learning Clustering Algorithms on Sample Dataset
Bidof Vic
No ratings yet
Fuzzy MODEL IDENTIFICATION BASED ON CLUSTER ESTIMATION
Document12 pages
Fuzzy MODEL IDENTIFICATION BASED ON CLUSTER ESTIMATION
Mamad Vigilante
No ratings yet
KNN VS Kmeans
Document3 pages
KNN VS Kmeans
Soubhagya Kumar Sahoo
No ratings yet
An Improved K-Medoid Clustering Algo
Document30 pages
An Improved K-Medoid Clustering Algo
Rahul Negi
No ratings yet
Clustering Algorithms on Iris Dataset
Document6 pages
Clustering Algorithms on Iris Dataset
Shrey Dixit
No ratings yet
A Tutorial On Clustering Algorithms
Document4 pages
A Tutorial On Clustering Algorithms
jczerna
No ratings yet
10 Clus Basic
Document92 pages
10 Clus Basic
Mike Ku
No ratings yet
A Novel Approach of Implementing An Optimal K-Means Plus Plus Algorithm For Scalar Data
Document6 pages
A Novel Approach of Implementing An Optimal K-Means Plus Plus Algorithm For Scalar Data
sinigersky
No ratings yet
Create List Using Range
Document6 pages
Create List Using Range
YUKTA JOSHI
No ratings yet
Efficient Hierarchical Clustering of Large Data Sets Using P-Trees
Document4 pages
Efficient Hierarchical Clustering of Large Data Sets Using P-Trees
nobeen666
No ratings yet
UNIT 3 Data Mining
Document11 pages
UNIT 3 Data Mining
mahi
No ratings yet
Clustering
Document80 pages
Clustering
Aatmaj Salunke
No ratings yet
Clustering Lecture
Document46 pages
Clustering Lecture
ahmetdursun03
No ratings yet
Clustering: ISOM3360 Data Mining For Business Analytics
Document28 pages
Clustering: ISOM3360 Data Mining For Business Analytics
Claire Lee
No ratings yet
K-Means & K-Harmonic Means: A Comparison of Two Unsupervised Clustering Algorithms
Document9 pages
K-Means & K-Harmonic Means: A Comparison of Two Unsupervised Clustering Algorithms
suveyda kursuncu
No ratings yet
The International Journal of Engineering and Science (The IJES)
Document4 pages
The International Journal of Engineering and Science (The IJES)
theijes
No ratings yet
JNU Project Design with K-Means Clustering
Document26 pages
JNU Project Design with K-Means Clustering
Faizan Shaikh
100% (1)
An Improved K-Means Algorithm Based On Mapreduce and Grid: Li Ma, Lei Gu, Bo Li, Yue Ma and Jin Wang
Document12 pages
An Improved K-Means Algorithm Based On Mapreduce and Grid: Li Ma, Lei Gu, Bo Li, Yue Ma and Jin Wang
jefferyleclerc
No ratings yet
Chapter 2 (01-09-2019)
Document13 pages
Chapter 2 (01-09-2019)
Derraz
No ratings yet
Efficient Clustering Methods for Spatial Data Mining
Document26 pages
Efficient Clustering Methods for Spatial Data Mining
2K19/BMBA/13 RITIKA
No ratings yet
K-Means Clustering
Document16 pages
K-Means Clustering
LUCKY
No ratings yet
KDD96 037
Document6 pages
KDD96 037
JulioMartinez
No ratings yet
An Initial Seed Selection Algorithm
Document11 pages
An Initial Seed Selection Algorithm
hamzarash090
No ratings yet
Effective Algorithms For Designing Power Distribution Networks
Document8 pages
Effective Algorithms For Designing Power Distribution Networks
docs mi
No ratings yet
Customer Segmentation Using Clustering and Data Mining Techniques
Document6 pages
Customer Segmentation Using Clustering and Data Mining Techniques
efiol
No ratings yet
A Novel Approach For Data Clustering Using Improved K-Means Algorithm PDF
Document6 pages
A Novel Approach For Data Clustering Using Improved K-Means Algorithm PDF
Ninad Samel
No ratings yet
Clustering Techniques - Hierarchical, K-Means Clustering
Document22 pages
Clustering Techniques - Hierarchical, K-Means Clustering
Tanya Sharma
No ratings yet
Efficient K-Means Clustering Algorithm Using Feature Weight and Min-Max Normalization
Document4 pages
Efficient K-Means Clustering Algorithm Using Feature Weight and Min-Max Normalization
Roopam
No ratings yet
KMSTME Data Mining Clustering Method
Document5 pages
KMSTME Data Mining Clustering Method
Mladen Maki Vuk
No ratings yet
CS583 Clustering
Document40 pages
CS583 Clustering
ruoibmt
No ratings yet
Clustering Techniques in ML: Submitted By: Pooja 16EJICS072
Document26 pages
Clustering Techniques in ML: Submitted By: Pooja 16EJICS072
RITESH JANGID
No ratings yet
Meeting 7 Unsupervised Learnign
Document95 pages
Meeting 7 Unsupervised Learnign
Antonio Victory
No ratings yet
Concepts and Techniques: Data Mining
Document43 pages
Concepts and Techniques: Data Mining
Esraa Samir
No ratings yet
Big Data
Document7 pages
Big Data
Sevendipity Science
No ratings yet
A Comparative Study of K-Means, DBSCAN and OPTICS
Document6 pages
A Comparative Study of K-Means, DBSCAN and OPTICS
kripsi ilyas
No ratings yet
Data mining modul 3 notes
Document3 pages
Data mining modul 3 notes
tempmail281103
No ratings yet
Partitioning Methods
Document3 pages
Partitioning Methods
Diyar T Alzuhairi
100% (1)
Pkmeans
Document6 pages
Pkmeans
Rubén Bresler Camps
No ratings yet
Data Mining Clustering
Document76 pages
Data Mining Clustering
Anjali Asha Jacob
No ratings yet
CZ4032 Data Analytics & Mining Notes
Document16 pages
CZ4032 Data Analytics & Mining Notes
Feng Chengxuan
No ratings yet
Concepts and Techniques: Data Mining
Document50 pages
Concepts and Techniques: Data Mining
Hasibur Rahman Porag
No ratings yet
18 A Comparison of Various Distance Functions On K - Mean Clustering Algorithm
Document9 pages
18 A Comparison of Various Distance Functions On K - Mean Clustering Algorithm
Irtefaa A.
No ratings yet
An Efficient K-Means Clustering Algorithm: Analysis and Implementation
Document12 pages
An Efficient K-Means Clustering Algorithm: Analysis and Implementation
Unaixa Khan
No ratings yet
Cluster Center Initialization Algorithm For K-Means Clustering
Document10 pages
Cluster Center Initialization Algorithm For K-Means Clustering
mauricetappa
No ratings yet
Clustering in AI
Document16 pages
Clustering in AI
Ram Kushwaha
No ratings yet
"These Are Just Rough Notes For References" What Is K-Means Clustering
Document9 pages
"These Are Just Rough Notes For References" What Is K-Means Clustering
Nikhil Jojen
No ratings yet
Pam Clustering Technique
Document13 pages
Pam Clustering Technique
Anoy
No ratings yet
Unsupervised Learning - Clustering Cheatsheet - Codecademy
Document5 pages
Unsupervised Learning - Clustering Cheatsheet - Codecademy
Imane Loukili
No ratings yet
ML Application in Signal Processing Clustering
Document27 pages
ML Application in Signal Processing Clustering
aniruddh nain
No ratings yet
A Genetic K-Means Clustering Algorithm Based On The Optimized Initial Centers
Document7 pages
A Genetic K-Means Clustering Algorithm Based On The Optimized Initial Centers
Arief Yuliansyah
No ratings yet
CTAC97: Cluster Analysis Using Triangulation
Document8 pages
CTAC97: Cluster Analysis Using Triangulation
Notiani Nabilatussa'adah
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Handbook of Cellular Manufacturing Systems (1999)
Document762 pages
Handbook of Cellular Manufacturing Systems (1999)
Sebastián Cáceres G
No ratings yet
Lab 2 Worksheet
Document2 pages
Lab 2 Worksheet
Pohuyist
0% (1)
Pattern Recoginition Assignment - 1 PDF
Document31 pages
Pattern Recoginition Assignment - 1 PDF
Anirudh Krishnan
No ratings yet
Research Proposal (Final)
Document30 pages
Research Proposal (Final)
jamesbond_19810105
No ratings yet
MAN-074 MICROSED-SYSTEM COAGULOMETTRE User Manual - v12
Document26 pages
MAN-074 MICROSED-SYSTEM COAGULOMETTRE User Manual - v12
Charlys Rajaobelina
No ratings yet
Astm G16 - 13
Document14 pages
Astm G16 - 13
Sofia Yuli
No ratings yet
Data Science Training in Bangalore-Learnbay - in
Document13 pages
Data Science Training in Bangalore-Learnbay - in
Krishna Kumar
100% (1)
Kelompok 1 - Soal Diskusi Dan Tugas
Document17 pages
Kelompok 1 - Soal Diskusi Dan Tugas
maya bangun
No ratings yet
Intro S4HANA Using Global Bike Navigation Course en v4.1
Document21 pages
Intro S4HANA Using Global Bike Navigation Course en v4.1
Adriana Caramia
No ratings yet
Data Management With SAS
Document88 pages
Data Management With SAS
manish_1
No ratings yet
CHAPTER 2 PART 1 Sampling Distribution
Document3 pages
CHAPTER 2 PART 1 Sampling Distribution
Nasuha Mutalib
No ratings yet
Chapter III
Document7 pages
Chapter III
Ferdinand Bernardo
No ratings yet
Introduction Big Bazaar
Document16 pages
Introduction Big Bazaar
Rishabh Singh Rajput
No ratings yet
Wollo University Kiot Department of Industrial Engineering Worksheet On Forecasting
Document3 pages
Wollo University Kiot Department of Industrial Engineering Worksheet On Forecasting
Talema
100% (1)
Data Projections & Visualization: Student Eng.: Maria-Alexandra MATEI
Document18 pages
Data Projections & Visualization: Student Eng.: Maria-Alexandra MATEI
Sakura Alexa
No ratings yet
7 - Handbook Degreesc - Boshm, BTM, BMMG, BPFM, Besm
Document26 pages
7 - Handbook Degreesc - Boshm, BTM, BMMG, BPFM, Besm
Islah Dean
No ratings yet
Spearman's Rank Correlation
Document22 pages
Spearman's Rank Correlation
Lee James Camillo
No ratings yet
Bio 160 Research Project Proposal Format
Document2 pages
Bio 160 Research Project Proposal Format
Ourlad Alzeus Gaddi Tantengco
No ratings yet
Unit 1
Document22 pages
Unit 1
Vishal Shivhare
No ratings yet
Chapter 5
Document43 pages
Chapter 5
Bikila Seketa
No ratings yet
Chapter 4-Moving-Average-Methods
Document48 pages
Chapter 4-Moving-Average-Methods
The Umiak
No ratings yet
Answer
Document5 pages
Answer
sonali Pradhan
100% (2)
Performance Audits Focused On The Principle of Effectiveness: An Overview of Public Audit Agencies
Document10 pages
Performance Audits Focused On The Principle of Effectiveness: An Overview of Public Audit Agencies
lgmury
No ratings yet
24 1.0166423 Fulltext
Document38 pages
24 1.0166423 Fulltext
mary chris serrano
No ratings yet
Office Automation Important Questions and Answers
Document9 pages
Office Automation Important Questions and Answers
Anjali Shenba
No ratings yet
Clustering
Document21 pages
Clustering
deetsha.banerjee
No ratings yet