TY - JOUR
T1 - What identifies different age cohorts in Yahoo! Answers?
AU - Figueroa, Alejandro
AU - Timilsina, Mohan
N1 - Publisher Copyright:
© 2021 Elsevier B.V.
PY - 2021/9/27
Y1 - 2021/9/27
N2 - For different kinds of online platforms, understanding demographics has shown to be instrumental in improving user experience, especially for personalizing and contextualizing content. Needless to say, there has been a number of studies delving into demographics in online social media platforms including Facebook and Twitter. However, only a mere handful of works have explored demographic factors behind community question-answering platforms despite their massive amount of members. For this reason, we decided to undertake a study of Yahoo! Answers members, namely as it relates to age demographics. To this end, we automatically built and annotated a large-scale corpus comprising metadata and textual inputs produced by ca. 650,000 community fellows. We profit from this collection by conducting both an exploratory/statistical analysis and predictive modelling. In the former, we explored the correlation between distinct age groups and some variables that, intuitively, can seem to be highly correlated with some cohorts. Interestingly enough, this analysis revealed that Millennials are answering questions prompted by their succeeding age group (GEN Z). In the latter, we assessed the prediction rate of various traditional statistical methods and neural networks classifiers coupled with numerous combinations of assorted textual and metadata features. Overall, best classifiers finished with an MRR of up to 0.862, and were modelled by means of FastText and Maximum Entropy (MaxEnt). In terms of informative attributes, user asking/answering activity patterns and sentimentally charged words provide telltale clues about which age group a community peer belongs to.
AB - For different kinds of online platforms, understanding demographics has shown to be instrumental in improving user experience, especially for personalizing and contextualizing content. Needless to say, there has been a number of studies delving into demographics in online social media platforms including Facebook and Twitter. However, only a mere handful of works have explored demographic factors behind community question-answering platforms despite their massive amount of members. For this reason, we decided to undertake a study of Yahoo! Answers members, namely as it relates to age demographics. To this end, we automatically built and annotated a large-scale corpus comprising metadata and textual inputs produced by ca. 650,000 community fellows. We profit from this collection by conducting both an exploratory/statistical analysis and predictive modelling. In the former, we explored the correlation between distinct age groups and some variables that, intuitively, can seem to be highly correlated with some cohorts. Interestingly enough, this analysis revealed that Millennials are answering questions prompted by their succeeding age group (GEN Z). In the latter, we assessed the prediction rate of various traditional statistical methods and neural networks classifiers coupled with numerous combinations of assorted textual and metadata features. Overall, best classifiers finished with an MRR of up to 0.862, and were modelled by means of FastText and Maximum Entropy (MaxEnt). In terms of informative attributes, user asking/answering activity patterns and sentimentally charged words provide telltale clues about which age group a community peer belongs to.
KW - Community question answering
KW - Intelligent information retrieval
KW - Natural language processing
KW - User demographic analysis
UR - http://www.scopus.com/inward/record.url?scp=85109740264&partnerID=8YFLogxK
U2 - 10.1016/j.knosys.2021.107278
DO - 10.1016/j.knosys.2021.107278
M3 - Article
AN - SCOPUS:85109740264
VL - 228
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
SN - 0950-7051
M1 - 107278
ER -