Textual Pre-trained Models for Gender Identification across Community Question-Answering Members

Pablo Schwarzenberg, Alejandro Figueroa

Research output: Contribution to journalArticlepeer-review

6 Citations (Scopus)

Abstract

Promoting engagement and participation is vital for online social networks such as community Question-Answering (cQA) sites. One way of increasing the contribution of their members is by connecting their content with the right target audience. To achieve this goal, demographic analysis is pivotal in deciphering the interest of each community fellow. Indeed, demographic factors such as gender are fundamental in reducing the gender disparity across distinct topics. This work assesses the classification rate of assorted state-of-the-art transformer-based models (e.g., BERT and FNET) on the task of gender identification across cQA fellows. For this purpose, it benefited from a massive text-oriented corpus encompassing 548,375 member profiles including their respective full-questions, answers and self-descriptions. This assisted in conducting large-scale experiments considering distinct combinations of encoders and sources. Contrary to our initial intuition, in average terms, self-descriptions were detrimental due to their sparseness. In effect, the best transformer models achieved an AUC of 0.92 by taking full-questions and answers into account (i.e., DeBERTa and MobileBERT). Our qualitative results reveal that fine-tuning on user-generated content is affected by pre-training on clean corpora, and that this adverse effect can be mitigated by correcting the case of words.

Original languageEnglish
Pages (from-to)3983-3995
Number of pages13
JournalIEEE Access
Volume11
DOIs
Publication statusPublished - 2023

Keywords

  • Gender identification
  • community question-answering sites
  • engagement and participation in online communities
  • transformers

ASJC Scopus subject areas

  • General Computer Science
  • General Materials Science
  • General Engineering
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Textual Pre-trained Models for Gender Identification across Community Question-Answering Members'. Together they form a unique fingerprint.

Cite this