Refining fine-tuned transformers with hand-crafted features for gender screening on question-answering communities

Alejandro Figueroa

Research output: Contribution to journalArticlepeer-review

7 Citations (Scopus)

Abstract

Machine learning and demographic analysis are a cornerstone for making community Question Answering (cQA) platforms more egalitarian and vibrant, safer as well. For instance, the two cooperate on successfully detecting suspicious/malicious activity and on stirring up the interest of community fellows to learn by exploring new topics. In this sense, both research fields play a vital role in reducing gender disparity across categories, when promoting unresolved questions to potential answerers. Current state-of-the-art artificial intelligence architectures, such as pre-trained transformers, train complex goals and million of parameters as a means of inferring and encoding knowledge from massive corpora. Fine-tuning is the process that allows later to transfer this encrypted information to a downstream task (e.g., gender classification). Needless to say, these pre-trained encoders also suffer from multiple disadvantages. To give an example, they are sensitive to irrelevant and misleading words, bringing about overfitting, usually on small datasets. This work offers a fresh look at this kind of technique by introducing PTM-SFFS, a novel approach that effectively pairs frontier transformers with linguistic properties via the use of traditional classifiers. Based on a feature wrapper (SFFS), PTM-SFFS refines the scores produced by a fine-tuned model via seeking for an array of mostly linguistic features to build a conventional statistical classifier (e.g., Bayes and MaxEnt). And as a result, this new discriminant function enhances the overall prediction rate by optimizing the synergy between both sorts of strategies. When applied to automatic gender recognition on cQA sites, PTM-SFFS increased the accuracy of seven fine-tuned state-of-the-art encoders up to 10% (XLNet). Thanks to its interpretability, we discover that it capitalizes on dependency parsing and metadata for improving the transference of lexicalized information to the target domain.

Original languageEnglish
Pages (from-to)256-267
Number of pages12
JournalInformation Fusion
Volume92
DOIs
Publication statusPublished - Apr 2023

Keywords

  • Community question answering
  • Gender recognition
  • Natural language processing
  • Pre-trained models
  • Statistical classifiers
  • User analysis

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Information Systems
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Refining fine-tuned transformers with hand-crafted features for gender screening on question-answering communities'. Together they form a unique fingerprint.

Cite this