Abstract
Machine learning and demographic analysis are a cornerstone for making community Question Answering (cQA) platforms more egalitarian and vibrant, safer as well. For instance, the two cooperate on successfully detecting suspicious/malicious activity and on stirring up the interest of community fellows to learn by exploring new topics. In this sense, both research fields play a vital role in reducing gender disparity across categories, when promoting unresolved questions to potential answerers. Current state-of-the-art artificial intelligence architectures, such as pre-trained transformers, train complex goals and million of parameters as a means of inferring and encoding knowledge from massive corpora. Fine-tuning is the process that allows later to transfer this encrypted information to a downstream task (e.g., gender classification). Needless to say, these pre-trained encoders also suffer from multiple disadvantages. To give an example, they are sensitive to irrelevant and misleading words, bringing about overfitting, usually on small datasets. This work offers a fresh look at this kind of technique by introducing PTM-SFFS, a novel approach that effectively pairs frontier transformers with linguistic properties via the use of traditional classifiers. Based on a feature wrapper (SFFS), PTM-SFFS refines the scores produced by a fine-tuned model via seeking for an array of mostly linguistic features to build a conventional statistical classifier (e.g., Bayes and MaxEnt). And as a result, this new discriminant function enhances the overall prediction rate by optimizing the synergy between both sorts of strategies. When applied to automatic gender recognition on cQA sites, PTM-SFFS increased the accuracy of seven fine-tuned state-of-the-art encoders up to 10% (XLNet). Thanks to its interpretability, we discover that it capitalizes on dependency parsing and metadata for improving the transference of lexicalized information to the target domain.
Original language | English |
---|---|
Pages (from-to) | 256-267 |
Number of pages | 12 |
Journal | Information Fusion |
Volume | 92 |
DOIs | |
Publication status | Published - Apr 2023 |
Keywords
- Community question answering
- Gender recognition
- Natural language processing
- Pre-trained models
- Statistical classifiers
- User analysis
ASJC Scopus subject areas
- Software
- Signal Processing
- Information Systems
- Hardware and Architecture