Answering definition questions: Dealing with data sparseness in lexicalised dependency trees-based language models

Alejandro Figueroa, John Atkinson

Resultado de la investigación: Conference contribution

Resumen

A crucial step in the answering process of definition questions, such as "Who is Gordon Brown?", is the ranking of answer candidates. In definition Question Answering (QA), sentences are normally interpreted as potential answers, and one of the most promising ranking strategies predicates upon Language Models (LMs). However, one of the factors that makes LMs less attractive is the fact that they can suffer from data sparseness, when the training material is insufficient or candidate sentences are too long. This paper analyses two methods, different in nature, for tackling data sparseness head-on: (1) combining LMs learnt from different, but overlapping, training corpora, and (2) selective substitutions grounded upon part-of-speech (POS) taggings. Results show that the first method improves the Mean Average Precision (MAP) of the top-ranked answers, while at the same time, it diminishes the average F-score of the final output. Conversely, the impact of the second approach depends on the test corpus.

Idioma originalEnglish
Título de la publicación alojadaWeb Information Systems and Technologies - 5th International Conference, WEBIST 2009, Revised Selected Papers
EditorialSpringer Verlag
Páginas297-310
Número de páginas14
ISBN (versión impresa)3642124356, 9783642124358
DOI
EstadoPublished - 1 ene 2010
Evento5th International Conference on Web Information Systems and Technologies, WEBIST 2009 - Lisbon, Portugal
Duración: 23 mar 200926 mar 2009

Serie de la publicación

NombreLecture Notes in Business Information Processing
Volumen45 LNBIP
ISSN (versión impresa)1865-1348

Other

Other5th International Conference on Web Information Systems and Technologies, WEBIST 2009
PaísPortugal
CiudadLisbon
Período23/03/0926/03/09

Huella dactilar

Language Model
Ranking
Question Answering
Tagging
Predicate
Overlapping
Substitution
Substitution reactions
Output
Language model
Corpus
Training

ASJC Scopus subject areas

  • Management Information Systems
  • Control and Systems Engineering
  • Business and International Management
  • Information Systems
  • Modelling and Simulation
  • Information Systems and Management

Citar esto

Figueroa, A., & Atkinson, J. (2010). Answering definition questions: Dealing with data sparseness in lexicalised dependency trees-based language models. En Web Information Systems and Technologies - 5th International Conference, WEBIST 2009, Revised Selected Papers (pp. 297-310). (Lecture Notes in Business Information Processing; Vol. 45 LNBIP). Springer Verlag. https://doi.org/10.1007/978-3-642-12436-5-22
Figueroa, Alejandro ; Atkinson, John. / Answering definition questions : Dealing with data sparseness in lexicalised dependency trees-based language models. Web Information Systems and Technologies - 5th International Conference, WEBIST 2009, Revised Selected Papers. Springer Verlag, 2010. pp. 297-310 (Lecture Notes in Business Information Processing).
@inproceedings{096712bd64284a399be45b080908023f,
title = "Answering definition questions: Dealing with data sparseness in lexicalised dependency trees-based language models",
abstract = "A crucial step in the answering process of definition questions, such as {"}Who is Gordon Brown?{"}, is the ranking of answer candidates. In definition Question Answering (QA), sentences are normally interpreted as potential answers, and one of the most promising ranking strategies predicates upon Language Models (LMs). However, one of the factors that makes LMs less attractive is the fact that they can suffer from data sparseness, when the training material is insufficient or candidate sentences are too long. This paper analyses two methods, different in nature, for tackling data sparseness head-on: (1) combining LMs learnt from different, but overlapping, training corpora, and (2) selective substitutions grounded upon part-of-speech (POS) taggings. Results show that the first method improves the Mean Average Precision (MAP) of the top-ranked answers, while at the same time, it diminishes the average F-score of the final output. Conversely, the impact of the second approach depends on the test corpus.",
keywords = "Data sparseness, Definition question answering, Definition questions, Definition search, Lexical dependency paths, n-gram language models, Web question answering",
author = "Alejandro Figueroa and John Atkinson",
year = "2010",
month = "1",
day = "1",
doi = "10.1007/978-3-642-12436-5-22",
language = "English",
isbn = "3642124356",
series = "Lecture Notes in Business Information Processing",
publisher = "Springer Verlag",
pages = "297--310",
booktitle = "Web Information Systems and Technologies - 5th International Conference, WEBIST 2009, Revised Selected Papers",

}

Figueroa, A & Atkinson, J 2010, Answering definition questions: Dealing with data sparseness in lexicalised dependency trees-based language models. En Web Information Systems and Technologies - 5th International Conference, WEBIST 2009, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 45 LNBIP, Springer Verlag, pp. 297-310, 5th International Conference on Web Information Systems and Technologies, WEBIST 2009, Lisbon, Portugal, 23/03/09. https://doi.org/10.1007/978-3-642-12436-5-22

Answering definition questions : Dealing with data sparseness in lexicalised dependency trees-based language models. / Figueroa, Alejandro; Atkinson, John.

Web Information Systems and Technologies - 5th International Conference, WEBIST 2009, Revised Selected Papers. Springer Verlag, 2010. p. 297-310 (Lecture Notes in Business Information Processing; Vol. 45 LNBIP).

Resultado de la investigación: Conference contribution

TY - GEN

T1 - Answering definition questions

T2 - Dealing with data sparseness in lexicalised dependency trees-based language models

AU - Figueroa, Alejandro

AU - Atkinson, John

PY - 2010/1/1

Y1 - 2010/1/1

N2 - A crucial step in the answering process of definition questions, such as "Who is Gordon Brown?", is the ranking of answer candidates. In definition Question Answering (QA), sentences are normally interpreted as potential answers, and one of the most promising ranking strategies predicates upon Language Models (LMs). However, one of the factors that makes LMs less attractive is the fact that they can suffer from data sparseness, when the training material is insufficient or candidate sentences are too long. This paper analyses two methods, different in nature, for tackling data sparseness head-on: (1) combining LMs learnt from different, but overlapping, training corpora, and (2) selective substitutions grounded upon part-of-speech (POS) taggings. Results show that the first method improves the Mean Average Precision (MAP) of the top-ranked answers, while at the same time, it diminishes the average F-score of the final output. Conversely, the impact of the second approach depends on the test corpus.

AB - A crucial step in the answering process of definition questions, such as "Who is Gordon Brown?", is the ranking of answer candidates. In definition Question Answering (QA), sentences are normally interpreted as potential answers, and one of the most promising ranking strategies predicates upon Language Models (LMs). However, one of the factors that makes LMs less attractive is the fact that they can suffer from data sparseness, when the training material is insufficient or candidate sentences are too long. This paper analyses two methods, different in nature, for tackling data sparseness head-on: (1) combining LMs learnt from different, but overlapping, training corpora, and (2) selective substitutions grounded upon part-of-speech (POS) taggings. Results show that the first method improves the Mean Average Precision (MAP) of the top-ranked answers, while at the same time, it diminishes the average F-score of the final output. Conversely, the impact of the second approach depends on the test corpus.

KW - Data sparseness

KW - Definition question answering

KW - Definition questions

KW - Definition search

KW - Lexical dependency paths

KW - n-gram language models

KW - Web question answering

UR - http://www.scopus.com/inward/record.url?scp=77952781169&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-12436-5-22

DO - 10.1007/978-3-642-12436-5-22

M3 - Conference contribution

AN - SCOPUS:77952781169

SN - 3642124356

SN - 9783642124358

T3 - Lecture Notes in Business Information Processing

SP - 297

EP - 310

BT - Web Information Systems and Technologies - 5th International Conference, WEBIST 2009, Revised Selected Papers

PB - Springer Verlag

ER -

Figueroa A, Atkinson J. Answering definition questions: Dealing with data sparseness in lexicalised dependency trees-based language models. En Web Information Systems and Technologies - 5th International Conference, WEBIST 2009, Revised Selected Papers. Springer Verlag. 2010. p. 297-310. (Lecture Notes in Business Information Processing). https://doi.org/10.1007/978-3-642-12436-5-22