Integrating heterogeneous sources for predicting question temporal anchors across Yahoo! Answers

Resultado de la investigación: Article

Resumen

Modern Community Question Answering (CQA) web forums provide the possibility to browse their archives using question-like search queries as in Information Retrieval (IR) systems. Although these traditional IR methods have become very successful at fetching semantically related questions, they typically leave unconsidered their temporal relations. That is to say, a group of questions may be asked more often during specific recurring time lines despite being semantically unrelated. In fact, predicting temporal aspects would not only assist these platforms in widening the semantic diversity of their search results, but also in re-stating questions that need to refresh their answers and in producing more dynamic, especially temporally-anchored, displays. In this paper, we devised a new set of time-frame specific categories for CQA questions, which is obtained by fusing two distinct earlier taxonomies (i.e., [29] and [50]). These new categories are then utilized in a large crowd-sourcing based human annotation effort. Accordingly, we present a systematical analysis of its results in terms of complexity and degree of difficulty as it relates to the different question topics1 Incidentally, through a large number of experiments, we investigate the effectiveness of a wider variety of linguistic features compared to what has been done in previous works. We additionally mix evidence/features distilled directly and indirectly from questions by capitalizing on their related web search results. We finally investigate the impact and effectiveness of multi-view learning to boost a large variety of multi-class supervised learners by optimizing a latent layer build on top of two views: one composed of features harvested from questions, and the other from CQA meta data and evidence extracted from web resources (i.e., snippets and Internet archives).

IdiomaEnglish
Páginas112-125
Número de páginas14
PublicaciónInformation Fusion
Volumen50
DOI
EstadoPublished - 1 oct 2019

Huella dactilar

Information retrieval systems
Taxonomies
Metadata
Anchors
Information retrieval
Linguistics
Semantics
Display devices
Internet
Experiments

Keywords

    ASJC Scopus subject areas

    • Software
    • Signal Processing
    • Information Systems
    • Hardware and Architecture

    Citar esto

    @article{7d05a3d776a34e85b3a3dfc2c2ceacc7,
    title = "Integrating heterogeneous sources for predicting question temporal anchors across Yahoo! Answers",
    abstract = "Modern Community Question Answering (CQA) web forums provide the possibility to browse their archives using question-like search queries as in Information Retrieval (IR) systems. Although these traditional IR methods have become very successful at fetching semantically related questions, they typically leave unconsidered their temporal relations. That is to say, a group of questions may be asked more often during specific recurring time lines despite being semantically unrelated. In fact, predicting temporal aspects would not only assist these platforms in widening the semantic diversity of their search results, but also in re-stating questions that need to refresh their answers and in producing more dynamic, especially temporally-anchored, displays. In this paper, we devised a new set of time-frame specific categories for CQA questions, which is obtained by fusing two distinct earlier taxonomies (i.e., [29] and [50]). These new categories are then utilized in a large crowd-sourcing based human annotation effort. Accordingly, we present a systematical analysis of its results in terms of complexity and degree of difficulty as it relates to the different question topics1 Incidentally, through a large number of experiments, we investigate the effectiveness of a wider variety of linguistic features compared to what has been done in previous works. We additionally mix evidence/features distilled directly and indirectly from questions by capitalizing on their related web search results. We finally investigate the impact and effectiveness of multi-view learning to boost a large variety of multi-class supervised learners by optimizing a latent layer build on top of two views: one composed of features harvested from questions, and the other from CQA meta data and evidence extracted from web resources (i.e., snippets and Internet archives).",
    keywords = "Intelligent information retrieval, Multi-view learning, Natural language processing, Question classification, Transfer learning, Web mining",
    author = "Alejandro Figueroa and Carlos G{\'o}mez-Pantoja and G{\"u}nter Neumann",
    year = "2019",
    month = "10",
    day = "1",
    doi = "10.1016/j.inffus.2018.10.006",
    language = "English",
    volume = "50",
    pages = "112--125",
    journal = "Information Fusion",
    issn = "1566-2535",
    publisher = "Elsevier",

    }

    TY - JOUR

    T1 - Integrating heterogeneous sources for predicting question temporal anchors across Yahoo! Answers

    AU - Figueroa, Alejandro

    AU - Gómez-Pantoja, Carlos

    AU - Neumann, Günter

    PY - 2019/10/1

    Y1 - 2019/10/1

    N2 - Modern Community Question Answering (CQA) web forums provide the possibility to browse their archives using question-like search queries as in Information Retrieval (IR) systems. Although these traditional IR methods have become very successful at fetching semantically related questions, they typically leave unconsidered their temporal relations. That is to say, a group of questions may be asked more often during specific recurring time lines despite being semantically unrelated. In fact, predicting temporal aspects would not only assist these platforms in widening the semantic diversity of their search results, but also in re-stating questions that need to refresh their answers and in producing more dynamic, especially temporally-anchored, displays. In this paper, we devised a new set of time-frame specific categories for CQA questions, which is obtained by fusing two distinct earlier taxonomies (i.e., [29] and [50]). These new categories are then utilized in a large crowd-sourcing based human annotation effort. Accordingly, we present a systematical analysis of its results in terms of complexity and degree of difficulty as it relates to the different question topics1 Incidentally, through a large number of experiments, we investigate the effectiveness of a wider variety of linguistic features compared to what has been done in previous works. We additionally mix evidence/features distilled directly and indirectly from questions by capitalizing on their related web search results. We finally investigate the impact and effectiveness of multi-view learning to boost a large variety of multi-class supervised learners by optimizing a latent layer build on top of two views: one composed of features harvested from questions, and the other from CQA meta data and evidence extracted from web resources (i.e., snippets and Internet archives).

    AB - Modern Community Question Answering (CQA) web forums provide the possibility to browse their archives using question-like search queries as in Information Retrieval (IR) systems. Although these traditional IR methods have become very successful at fetching semantically related questions, they typically leave unconsidered their temporal relations. That is to say, a group of questions may be asked more often during specific recurring time lines despite being semantically unrelated. In fact, predicting temporal aspects would not only assist these platforms in widening the semantic diversity of their search results, but also in re-stating questions that need to refresh their answers and in producing more dynamic, especially temporally-anchored, displays. In this paper, we devised a new set of time-frame specific categories for CQA questions, which is obtained by fusing two distinct earlier taxonomies (i.e., [29] and [50]). These new categories are then utilized in a large crowd-sourcing based human annotation effort. Accordingly, we present a systematical analysis of its results in terms of complexity and degree of difficulty as it relates to the different question topics1 Incidentally, through a large number of experiments, we investigate the effectiveness of a wider variety of linguistic features compared to what has been done in previous works. We additionally mix evidence/features distilled directly and indirectly from questions by capitalizing on their related web search results. We finally investigate the impact and effectiveness of multi-view learning to boost a large variety of multi-class supervised learners by optimizing a latent layer build on top of two views: one composed of features harvested from questions, and the other from CQA meta data and evidence extracted from web resources (i.e., snippets and Internet archives).

    KW - Intelligent information retrieval

    KW - Multi-view learning

    KW - Natural language processing

    KW - Question classification

    KW - Transfer learning

    KW - Web mining

    UR - http://www.scopus.com/inward/record.url?scp=85055179488&partnerID=8YFLogxK

    U2 - 10.1016/j.inffus.2018.10.006

    DO - 10.1016/j.inffus.2018.10.006

    M3 - Article

    VL - 50

    SP - 112

    EP - 125

    JO - Information Fusion

    T2 - Information Fusion

    JF - Information Fusion

    SN - 1566-2535

    ER -