Exploring effective features for recognizing the user intent behind web queries

Resultado de la investigación: Article

17 Citas (Scopus)

Resumen

Automatically identifying the user intent behind web queries has started to catch the attention of the research community, since it allows search engines to enhance user experience by adapting results to that goal. It is broadly agreed that there are three archetypal intentions behind search queries: navigational, resource/transactional and informational. Thus, as a natural consequence, this task has been interpreted as a multi-class classification problem. At large, recent works have focused on comparing several machine learning methods built with words as features. Conversely, this paper examines the influence of assorted properties on three classification approaches. In particular, it focuses its attention on the contribution of linguistic-based attributes. However, most of natural language processing tools are designed for documents, not web queries. Therefore, as a means of bridging this linguistic gap, we benefited from caseless models, which are trained with traditionally labeled data, but all terms are converted to lowercase before their generation. Overall, tested attributes proved to be effective by improving on word-based classifiers by up to 8.347% (accuracy), and outperforming a baseline by up to 6.17%. Most notably, linguistic-oriented features, from caseless models, are shown to be instrumental in narrowing the linguistic gap between queries and documents.

Idioma originalEnglish
Páginas (desde-hasta)162-169
Número de páginas8
PublicaciónComputers in Industry
Volumen68
DOI
EstadoPublished - 1 ene 2015

Huella dactilar

Linguistics
Search engines
Learning systems
Classifiers
Processing

ASJC Scopus subject areas

  • Computer Science(all)
  • Engineering(all)

Citar esto

@article{6f2aefadb5284fb78a67d725c56594c7,
title = "Exploring effective features for recognizing the user intent behind web queries",
abstract = "Automatically identifying the user intent behind web queries has started to catch the attention of the research community, since it allows search engines to enhance user experience by adapting results to that goal. It is broadly agreed that there are three archetypal intentions behind search queries: navigational, resource/transactional and informational. Thus, as a natural consequence, this task has been interpreted as a multi-class classification problem. At large, recent works have focused on comparing several machine learning methods built with words as features. Conversely, this paper examines the influence of assorted properties on three classification approaches. In particular, it focuses its attention on the contribution of linguistic-based attributes. However, most of natural language processing tools are designed for documents, not web queries. Therefore, as a means of bridging this linguistic gap, we benefited from caseless models, which are trained with traditionally labeled data, but all terms are converted to lowercase before their generation. Overall, tested attributes proved to be effective by improving on word-based classifiers by up to 8.347{\%} (accuracy), and outperforming a baseline by up to 6.17{\%}. Most notably, linguistic-oriented features, from caseless models, are shown to be instrumental in narrowing the linguistic gap between queries and documents.",
keywords = "Feature analysis, Query analysis, Query classification, Search query understanding, User experience, User intent",
author = "Alejandro Figueroa",
year = "2015",
month = "1",
day = "1",
doi = "10.1016/j.compind.2015.01.005",
language = "English",
volume = "68",
pages = "162--169",
journal = "Computers in Industry",
issn = "0166-3615",
publisher = "Elsevier",

}

Exploring effective features for recognizing the user intent behind web queries. / Figueroa, Alejandro.

En: Computers in Industry, Vol. 68, 01.01.2015, p. 162-169.

Resultado de la investigación: Article

TY - JOUR

T1 - Exploring effective features for recognizing the user intent behind web queries

AU - Figueroa, Alejandro

PY - 2015/1/1

Y1 - 2015/1/1

N2 - Automatically identifying the user intent behind web queries has started to catch the attention of the research community, since it allows search engines to enhance user experience by adapting results to that goal. It is broadly agreed that there are three archetypal intentions behind search queries: navigational, resource/transactional and informational. Thus, as a natural consequence, this task has been interpreted as a multi-class classification problem. At large, recent works have focused on comparing several machine learning methods built with words as features. Conversely, this paper examines the influence of assorted properties on three classification approaches. In particular, it focuses its attention on the contribution of linguistic-based attributes. However, most of natural language processing tools are designed for documents, not web queries. Therefore, as a means of bridging this linguistic gap, we benefited from caseless models, which are trained with traditionally labeled data, but all terms are converted to lowercase before their generation. Overall, tested attributes proved to be effective by improving on word-based classifiers by up to 8.347% (accuracy), and outperforming a baseline by up to 6.17%. Most notably, linguistic-oriented features, from caseless models, are shown to be instrumental in narrowing the linguistic gap between queries and documents.

AB - Automatically identifying the user intent behind web queries has started to catch the attention of the research community, since it allows search engines to enhance user experience by adapting results to that goal. It is broadly agreed that there are three archetypal intentions behind search queries: navigational, resource/transactional and informational. Thus, as a natural consequence, this task has been interpreted as a multi-class classification problem. At large, recent works have focused on comparing several machine learning methods built with words as features. Conversely, this paper examines the influence of assorted properties on three classification approaches. In particular, it focuses its attention on the contribution of linguistic-based attributes. However, most of natural language processing tools are designed for documents, not web queries. Therefore, as a means of bridging this linguistic gap, we benefited from caseless models, which are trained with traditionally labeled data, but all terms are converted to lowercase before their generation. Overall, tested attributes proved to be effective by improving on word-based classifiers by up to 8.347% (accuracy), and outperforming a baseline by up to 6.17%. Most notably, linguistic-oriented features, from caseless models, are shown to be instrumental in narrowing the linguistic gap between queries and documents.

KW - Feature analysis

KW - Query analysis

KW - Query classification

KW - Search query understanding

KW - User experience

KW - User intent

UR - http://www.scopus.com/inward/record.url?scp=84923636099&partnerID=8YFLogxK

U2 - 10.1016/j.compind.2015.01.005

DO - 10.1016/j.compind.2015.01.005

M3 - Article

AN - SCOPUS:84923636099

VL - 68

SP - 162

EP - 169

JO - Computers in Industry

JF - Computers in Industry

SN - 0166-3615

ER -