Abstract
Modern community Question Answering (cQA) platforms encourage their members to publish any sort of question, which later on can get numerous answers from other community peers. However, in this dynamic, there is an intrinsic delay from the moment questions are posted until the arrival of acceptable and/or diverse responses. Therefore, cQA platforms have the pressing need for promoting unresolved questions to potential answerers, while also reducing gender disparity across their topics, for example. Needless to say, demographic analysis occupies a crucial role in successfully responding to these challenges. Nonetheless, there are only a handful of studies dissecting automatic gender recognition across cQA fellows. As far as we know, this work is the first effort to tease out the contribution to this task of the different kinds of textual inputs contained in their profiles (i.e., question titles and bodies, answers and self-descriptions). With this goal, we compare three different types of machine learning approaches under several combinations of these four input signals: traditional neural networks (e.g., RCNN and CNN), fine-tuned pre-trained transformers (e.g., BERT and RoBERTa) and statistical methods enriched with hand-crafted linguistic features (e.g., Bayes and MaxEnt). In a nutshell, our results show that pre-trained transformers are superior when dealing with full questions, conventional neural networks when mixing diverse text signals, and statistical methods when the dataset encompasses mostly noisy user-generated content, namely answers. In addition, our in-depth analysis reveals that dependency parsing is instrumental in designing hand-crafted features capable of modelling topic information, and that both genders are conspicuously represented by some specific topic distributions.
Original language | English |
---|---|
Article number | 119405 |
Journal | Expert Systems with Applications |
Volume | 215 |
DOIs | |
Publication status | Published - 1 Apr 2023 |
Keywords
- Community question answering
- Deep neural networks
- Expert systems
- Gender recognition
- Pre-trained models
- Statistical methods
ASJC Scopus subject areas
- General Engineering
- Computer Science Applications
- Artificial Intelligence