Enhancing K-Means using class labels

Billy Peralta, Pablo Espinace, Alvaro Soto

Resultado de la investigación: Article

6 Citas (Scopus)

Resumen

Clustering is a relevant problem in machine learning where the main goal is to locate meaningful partitions of unlabeled data. In the case of labeled data, a related problem is supervised clustering, where the objective is to locate class-uniform clusters. Most current approaches to supervised clustering optimize a score related to cluster purity with respect to class labels. In particular, we present Labeled K-Means (LK-Means), an algorithm for supervised clustering based on a variant of K-Means that incorporates information about class labels. LK-Means replaces the classical cost function of K-Means by a convex combination of the joint cost associated to: (i) A discriminative score based on class labels, and (ii) A generative score based on a traditional metric for unsupervised clustering. We test the performance of LK-Means using standard real datasets and an application for object recognition. Moreover, we also compare its performance against classical K-Means and a popular K-Medoids-based supervised clustering method. Our experiments show that, in most cases, LK-Means outperforms the alternative techniques by a considerable margin. Furthermore, LK-Means presents execution times considerably lower than the alternative supervised clustering method under evaluation.

Idioma originalEnglish
Páginas (desde-hasta)1023-1039
Número de páginas17
PublicaciónIntelligent Data Analysis
Volumen17
N.º6
DOI
EstadoPublished - 12 dic 2013

Huella dactilar

K-means
Labels
Object recognition
Clustering
Cost functions
Learning systems
Clustering Methods
Unsupervised Clustering
Class
Costs
Alternatives
Convex Combination
Object Recognition
Experiments
Margin
Execution Time
Cost Function
Machine Learning
Optimise
Partition

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

Citar esto

Peralta, Billy ; Espinace, Pablo ; Soto, Alvaro. / Enhancing K-Means using class labels. En: Intelligent Data Analysis. 2013 ; Vol. 17, N.º 6. pp. 1023-1039.
@article{b30608cc46014b4ca55c788b883773b5,
title = "Enhancing K-Means using class labels",
abstract = "Clustering is a relevant problem in machine learning where the main goal is to locate meaningful partitions of unlabeled data. In the case of labeled data, a related problem is supervised clustering, where the objective is to locate class-uniform clusters. Most current approaches to supervised clustering optimize a score related to cluster purity with respect to class labels. In particular, we present Labeled K-Means (LK-Means), an algorithm for supervised clustering based on a variant of K-Means that incorporates information about class labels. LK-Means replaces the classical cost function of K-Means by a convex combination of the joint cost associated to: (i) A discriminative score based on class labels, and (ii) A generative score based on a traditional metric for unsupervised clustering. We test the performance of LK-Means using standard real datasets and an application for object recognition. Moreover, we also compare its performance against classical K-Means and a popular K-Medoids-based supervised clustering method. Our experiments show that, in most cases, LK-Means outperforms the alternative techniques by a considerable margin. Furthermore, LK-Means presents execution times considerably lower than the alternative supervised clustering method under evaluation.",
keywords = "K-Means, K-Medoids, Supervised clustering",
author = "Billy Peralta and Pablo Espinace and Alvaro Soto",
year = "2013",
month = "12",
day = "12",
doi = "10.3233/IDA-130618",
language = "English",
volume = "17",
pages = "1023--1039",
journal = "Intelligent Data Analysis",
issn = "1088-467X",
publisher = "IOS Press",
number = "6",

}

Enhancing K-Means using class labels. / Peralta, Billy; Espinace, Pablo; Soto, Alvaro.

En: Intelligent Data Analysis, Vol. 17, N.º 6, 12.12.2013, p. 1023-1039.

Resultado de la investigación: Article

TY - JOUR

T1 - Enhancing K-Means using class labels

AU - Peralta, Billy

AU - Espinace, Pablo

AU - Soto, Alvaro

PY - 2013/12/12

Y1 - 2013/12/12

N2 - Clustering is a relevant problem in machine learning where the main goal is to locate meaningful partitions of unlabeled data. In the case of labeled data, a related problem is supervised clustering, where the objective is to locate class-uniform clusters. Most current approaches to supervised clustering optimize a score related to cluster purity with respect to class labels. In particular, we present Labeled K-Means (LK-Means), an algorithm for supervised clustering based on a variant of K-Means that incorporates information about class labels. LK-Means replaces the classical cost function of K-Means by a convex combination of the joint cost associated to: (i) A discriminative score based on class labels, and (ii) A generative score based on a traditional metric for unsupervised clustering. We test the performance of LK-Means using standard real datasets and an application for object recognition. Moreover, we also compare its performance against classical K-Means and a popular K-Medoids-based supervised clustering method. Our experiments show that, in most cases, LK-Means outperforms the alternative techniques by a considerable margin. Furthermore, LK-Means presents execution times considerably lower than the alternative supervised clustering method under evaluation.

AB - Clustering is a relevant problem in machine learning where the main goal is to locate meaningful partitions of unlabeled data. In the case of labeled data, a related problem is supervised clustering, where the objective is to locate class-uniform clusters. Most current approaches to supervised clustering optimize a score related to cluster purity with respect to class labels. In particular, we present Labeled K-Means (LK-Means), an algorithm for supervised clustering based on a variant of K-Means that incorporates information about class labels. LK-Means replaces the classical cost function of K-Means by a convex combination of the joint cost associated to: (i) A discriminative score based on class labels, and (ii) A generative score based on a traditional metric for unsupervised clustering. We test the performance of LK-Means using standard real datasets and an application for object recognition. Moreover, we also compare its performance against classical K-Means and a popular K-Medoids-based supervised clustering method. Our experiments show that, in most cases, LK-Means outperforms the alternative techniques by a considerable margin. Furthermore, LK-Means presents execution times considerably lower than the alternative supervised clustering method under evaluation.

KW - K-Means

KW - K-Medoids

KW - Supervised clustering

UR - http://www.scopus.com/inward/record.url?scp=84889654224&partnerID=8YFLogxK

U2 - 10.3233/IDA-130618

DO - 10.3233/IDA-130618

M3 - Article

VL - 17

SP - 1023

EP - 1039

JO - Intelligent Data Analysis

JF - Intelligent Data Analysis

SN - 1088-467X

IS - 6

ER -