Answering definition questions: Dealing with data sparseness in lexicalised dependency trees-based language models

Alejandro Figueroa, John Atkinson

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

A crucial step in the answering process of definition questions, such as "Who is Gordon Brown?", is the ranking of answer candidates. In definition Question Answering (QA), sentences are normally interpreted as potential answers, and one of the most promising ranking strategies predicates upon Language Models (LMs). However, one of the factors that makes LMs less attractive is the fact that they can suffer from data sparseness, when the training material is insufficient or candidate sentences are too long. This paper analyses two methods, different in nature, for tackling data sparseness head-on: (1) combining LMs learnt from different, but overlapping, training corpora, and (2) selective substitutions grounded upon part-of-speech (POS) taggings. Results show that the first method improves the Mean Average Precision (MAP) of the top-ranked answers, while at the same time, it diminishes the average F-score of the final output. Conversely, the impact of the second approach depends on the test corpus.

Original languageEnglish
Title of host publicationWeb Information Systems and Technologies - 5th International Conference, WEBIST 2009, Revised Selected Papers
PublisherSpringer Verlag
Pages297-310
Number of pages14
ISBN (Print)3642124356, 9783642124358
DOIs
Publication statusPublished - 1 Jan 2010
Event5th International Conference on Web Information Systems and Technologies, WEBIST 2009 - Lisbon, Portugal
Duration: 23 Mar 200926 Mar 2009

Publication series

NameLecture Notes in Business Information Processing
Volume45 LNBIP
ISSN (Print)1865-1348

Other

Other5th International Conference on Web Information Systems and Technologies, WEBIST 2009
Country/TerritoryPortugal
CityLisbon
Period23/03/0926/03/09

Keywords

  • Data sparseness
  • Definition question answering
  • Definition questions
  • Definition search
  • Lexical dependency paths
  • n-gram language models
  • Web question answering

ASJC Scopus subject areas

  • Management Information Systems
  • Control and Systems Engineering
  • Business and International Management
  • Information Systems
  • Modelling and Simulation
  • Information Systems and Management

Fingerprint

Dive into the research topics of 'Answering definition questions: Dealing with data sparseness in lexicalised dependency trees-based language models'. Together they form a unique fingerprint.

Cite this