Refining fine-tuned transformers with hand-crafted features for gender screening on question-answering communities

Alejandro Figueroa

Producción científica: Contribución a una revistaArtículorevisión exhaustiva

7 Citas (Scopus)

Resumen

Machine learning and demographic analysis are a cornerstone for making community Question Answering (cQA) platforms more egalitarian and vibrant, safer as well. For instance, the two cooperate on successfully detecting suspicious/malicious activity and on stirring up the interest of community fellows to learn by exploring new topics. In this sense, both research fields play a vital role in reducing gender disparity across categories, when promoting unresolved questions to potential answerers. Current state-of-the-art artificial intelligence architectures, such as pre-trained transformers, train complex goals and million of parameters as a means of inferring and encoding knowledge from massive corpora. Fine-tuning is the process that allows later to transfer this encrypted information to a downstream task (e.g., gender classification). Needless to say, these pre-trained encoders also suffer from multiple disadvantages. To give an example, they are sensitive to irrelevant and misleading words, bringing about overfitting, usually on small datasets. This work offers a fresh look at this kind of technique by introducing PTM-SFFS, a novel approach that effectively pairs frontier transformers with linguistic properties via the use of traditional classifiers. Based on a feature wrapper (SFFS), PTM-SFFS refines the scores produced by a fine-tuned model via seeking for an array of mostly linguistic features to build a conventional statistical classifier (e.g., Bayes and MaxEnt). And as a result, this new discriminant function enhances the overall prediction rate by optimizing the synergy between both sorts of strategies. When applied to automatic gender recognition on cQA sites, PTM-SFFS increased the accuracy of seven fine-tuned state-of-the-art encoders up to 10% (XLNet). Thanks to its interpretability, we discover that it capitalizes on dependency parsing and metadata for improving the transference of lexicalized information to the target domain.

Idioma originalInglés
Páginas (desde-hasta)256-267
Número de páginas12
PublicaciónInformation Fusion
Volumen92
DOI
EstadoPublicada - abr. 2023

Áreas temáticas de ASJC Scopus

  • Software
  • Procesamiento de senales
  • Sistemas de información
  • Hardware y arquitectura

Huella

Profundice en los temas de investigación de 'Refining fine-tuned transformers with hand-crafted features for gender screening on question-answering communities'. En conjunto forman una huella única.

Citar esto