Abstract
Student dropout remains a critical challenge in higher education institutions worldwide, affecting not only individual academic trajectories but also institutional effectiveness and societal development. This research investigates the application of Machine Learning (ML) techniques to predict student dropout risk by analyzing behavioral patterns extracted from Learning Management System (LMS) data. The study synthesizes contemporary research findings and explores how digital traces of student engagement, combined with academic and demographic data, can facilitate early identification of at-risk students. By examining various ML algorithms including k-Nearest Neighbors (k-NN), Neural Networks (NN), Decision Trees (DT), and Naive Bayes (NB), this research demonstrates that behavioral pattern analysis significantly enhances prediction accuracy compared to traditional statistical methods. Empirical validation reveals that k-NN with k=3 achieves optimal performance with 87% sensitivity, while feature correlation analysis identifies strong relationships between test performance, project completion, and final result points. The findings reveal that LMS activity metrics, particularly access frequency, test engagement, and assignment submission behaviors, serve as strong indicators of dropout risk when combined with academic performance data. Furthermore, ROC curve analysis demonstrates that ensemble approaches and optimized k-NN classifiers substantially outperform baseline methods in distinguishing dropout-prone students from persisters. The research contributes to the growing body of knowledge in educational data mining by providing a comprehensive framework for integrating behavioral analytics into institutional retention strategies, ultimately supporting data-driven decision-making for improved student success outcomes.
Keywords
- student dropout prediction
- machine learning
- behavioral patterns
- learning management systems
- educational data mining
- k-nearest neighbors
References
- 1. Rebelo Marcolino M, Reis Porto T, Thompsen Primo T, et al. Student dropout prediction through machine learning optimization: insights from moodle log data. Scientific Reports. 2025;15:9840.
- 2. Xing W, Du D. Dropout prediction in MOOCs: Using deep learning for personalized intervention. Journal of Educational Computing Research. 2019;57(3):547-570.
- 3. Buschetto Macarini LA, Cechinel C, Batista Machado MF, et al. Predicting students success in blended learning—evaluating different interactions inside learning management systems. Applied Sciences. 2019;9(24):5523.
- 4. Agrusti F, Bonavolontà G, Mezzini M. University dropout prediction through educational data mining techniques: A systematic review. Journal of e-Learning and Knowledge Society. 2019;15(3):161-182.
- 5. Behr, A., Giese, M., Teguim Kamdjou, H. D., & Theune, K. (2020). Dropping out of university: a literature review. Review of Education, 8(2), 614-652.
- 6. Cao, W., Mai, N. T., & Liu, W. (2025). Adaptive knowledge assessment via symmetric hierarchical Bayesian neural networks with graph symmetry-aware concept dependencies. Symmetry, 17(8), 1332.
- 7. Niyogisubizo J, Liao L, Nziyumva E, et al. Predicting student's dropout in university classes using two-layer ensemble machine learning approach: A novel stacked generalization. Computers and Education: Artificial Intelligence. 2022;3:100066.
- 8. Tamada MM, Giusti R, Netto JFM. Predicting students at risk of dropout in technical course using LMS logs. Electronics. 2022;11(3):468.
- 9. Matz SC, Bukow CS, Peters H, et al. Using machine learning to predict student retention from socio-demographic characteristics and app-based engagement metrics. Scientific Reports. 2023;13(1):5705.
- 10. Abbaspour Tazehkand, S. (2024). Enhancing Student Graduation Rates by Mitigating Failure, Dropout, and Withdrawal in Introduction to Statistical Courses Using Statistical and Machine Learning.
- 11. Fredricks, J. A., Reschly, A. L., & Christenson, S. L. (Eds.). (2019). Handbook of student engagement interventions: Working with disengaged students. Academic Press.
- 12. Rane, N. L., Paramesha, M., Choudhary, S. P., & Rane, J. (2024). Machine learning and deep learning for big data analytics: A review of methods and applications. Partners Universal International Innovation Journal, 2(3), 172-197.
- 13. Chugh, R., Turnbull, D., Cowling, M. A., Vanderburg, R., & Vanderburg, M. A. (2023). Implementing educational technology in Higher Education Institutions: A review of technologies, stakeholder perceptions, frameworks and metrics. Education and Information Technologies, 28(12), 16403-16429.
- 14. Qiu, L. (2025). Reinforcement Learning Approaches for Intelligent Control of Smart Building Energy Systems with Real-Time Adaptation to Occupant Behavior and Weather Conditions. Journal of Computing and Electronic Information Management, 18(2), 32-37.
- 15. Zhang, H. (2025). Physics-Informed Neural Networks for High-Fidelity Electromagnetic Field Approximation in VLSI and RF EDA Applications. Journal of Computing and Electronic Information Management, 18(2), 38-46.
- 16. Qiu, L. (2025). Multi-Agent Reinforcement Learning for Coordinated Smart Grid and Building Energy Management Across Urban Communities. Computer Life, 13(3), 8-15.
- 17. Li, J., Fan, L., Wang, X., Sun, T., & Zhou, M. (2024). Product demand prediction with spatial graph neural networks. Applied Sciences, 14(16), 6989.
- 18. Qiu, L. (2025). Machine Learning Approaches to Minimize Carbon Emissions through Optimized Road Traffic Flow and Routing. Frontiers in Environmental Science and Sustainability, 2(1), 30-41.
- 19. Ma, Z., Chen, X., Sun, T., Wang, X., Wu, Y. C., & Zhou, M. (2024). Blockchain-based zero-trust supply chain security integrated with deep reinforcement learning for inventory optimization. Future Internet, 16(5), 163.
- 20. Sun, T., Yang, J., Li, J., Chen, J., Liu, M., Fan, L., & Wang, X. (2024). Enhancing auto insurance risk evaluation with transformer and SHAP. IEEE Access.
- 21. Mai, N. T., Cao, W., & Liu, W. (2025). Interpretable knowledge tracing via transformer-Bayesian hybrid networks: Learning temporal dependencies and causal structures in educational data. Applied Sciences, 15(17), 9605.
- 22. Ge, Y., Wang, Y., Liu, J., & Wang, J. (2025). GAN-Enhanced Implied Volatility Surface Reconstruction for Option Pricing Error Mitigation. IEEE Access.
- 23. Zheng, W., & Liu, W. (2025). Symmetry-Aware Transformers for Asymmetric Causal Discovery in Financial Time Series. Symmetry, 17(10), 1591.
- 24. Tan, Y., Wu, B., Cao, J., & Jiang, B. (2025). LLaMA-UTP: Knowledge-Guided Expert Mixture for Analyzing Uncertain Tax Positions. IEEE Access.
- 25. Liu, Y., Ren, S., Wang, X., & Zhou, M. (2024). Temporal logical attention network for log-based anomaly detection in distributed systems. Sensors, 24(24), 7949.
- 26. Ren, S., Jin, J., Niu, G., & Liu, Y. (2025). ARCS: Adaptive Reinforcement Learning Framework for Automated Cybersecurity Incident Response Strategy Optimization. Applied Sciences, 15(2), 951.
- 27. Dutt, A., Ismail, M. A., Herawan, T., & Targio, I. A. (2024). Partition-based clustering algorithms applied to mixed data for educational data mining: a survey from 1971 to 2024. IEEE Access.
- 28. Zhang, Q., Chen, S., & Liu, W. (2025). Balanced Knowledge Transfer in MTTL-ClinicalBERT: A Symmetrical Multi-Task Learning Framework for Clinical Text Classification. Symmetry, 17(6), 823.
- 29. Chen, S., Liu, Y., Zhang, Q., Shao, Z., & Wang, Z. (2025). Multi‐Distance Spatial‐Temporal Graph Neural Network for Anomaly Detection in Blockchain Transactions. Advanced Intelligent Systems, 2400898.
- 30. Mai, N. T., Cao, W., & Wang, Y. (2025). The global belonging support framework: Enhancing equity and access for international graduate students. Journal of International Students, 15(9), 141-160.