Enhancing K-Means using class labels

Billy Peralta, Pablo Espinace, Alvaro Soto

Research output: Contribution to journalArticlepeer-review

7 Citations (Scopus)

Abstract

Clustering is a relevant problem in machine learning where the main goal is to locate meaningful partitions of unlabeled data. In the case of labeled data, a related problem is supervised clustering, where the objective is to locate class-uniform clusters. Most current approaches to supervised clustering optimize a score related to cluster purity with respect to class labels. In particular, we present Labeled K-Means (LK-Means), an algorithm for supervised clustering based on a variant of K-Means that incorporates information about class labels. LK-Means replaces the classical cost function of K-Means by a convex combination of the joint cost associated to: (i) A discriminative score based on class labels, and (ii) A generative score based on a traditional metric for unsupervised clustering. We test the performance of LK-Means using standard real datasets and an application for object recognition. Moreover, we also compare its performance against classical K-Means and a popular K-Medoids-based supervised clustering method. Our experiments show that, in most cases, LK-Means outperforms the alternative techniques by a considerable margin. Furthermore, LK-Means presents execution times considerably lower than the alternative supervised clustering method under evaluation.

Original languageEnglish
Pages (from-to)1023-1039
Number of pages17
JournalIntelligent Data Analysis
Volume17
Issue number6
DOIs
Publication statusPublished - 12 Dec 2013

Keywords

  • K-Means
  • K-Medoids
  • Supervised clustering

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Enhancing K-Means using class labels'. Together they form a unique fingerprint.

Cite this