The complex problem of student dropout represents an opportunity for the application of data mining technology and methods in higher education. The objective of this research is to obtain the profile of students at risk of dropping out and thus generate student management plans that impact on the variables that explain this situation. For this, it is proposed to use a CRISP-DM methodological structure, applying statistical tools and unsupervised machine learning. The cross-sectional analysis was carried out on a universe of freshmen day students at a private Chilean university. The sociodemographic and behavioural variables used were based on attrition theory and expert judgment, and the data were obtained from the historical records available at the Institution. To obtain the variables that most influenced dropout, correlation and principal component analyses were performed. The application of agglomerative hierarchical clustering and rough sets technique produced four profiles of students with their respective association rules and five academic variables that allowed the design of a support system to reduce dropout and promote retention.
ASJC Scopus subject areas
- Language and Linguistics
- Linguistics and Language
- Computer Science Applications