College student mental health assessment: Predictive models based on machine learning and feature importance analysis
Abstract
To assess and forecast the mental health conditions of university students utilizing machine learning methodologies, focusing particularly on the influence of the nine psychological symptom dimensions encompassed by the Symptom Checklist-90 (SCL-90). The prevalence of mental health issues among college students is a significant concern. Traditional methods for assessing mental health may lack the precision required for early detection and intervention. Machine learning offers advanced tools to analyze complex data and predict outcomes based on multiple variables. The primary objective is to construct and evaluate predictive models for the mental health status of college students using various machine learning algorithms, optimize their performance, and identify the most impactful psychological symptom dimensions. Psychological health data from 11,943 college students were gathered via an online questionnaire platform. Multiple machine learning algorithms were utilized to develop predictive models. Hyperparameter optimization was achieved through K-fold cross-validation and the northern goshawk optimization algorithm. To tackle class imbalance, the synthetic minority over-sampling technique was employed to create synthetic samples for underrepresented classes. Model performance was assessed using metrics such as accuracy, recall, and f1 score. The light gradient boosting algorithm demonstrated superior performance, with only 6 misclassifications out of 2,388 test samples. Tree-based ensemble methods like random forest and extreme gradient boosting consistently outperformed non-ensemble methods such as k-nearest neighbors, multi-layer perceptron, and kernel discriminant analysis. A detailed analysis using Shapley additive explanations values indicated that features such as obsessive-compulsive symptoms and anxiety were the most influential in the model’s predictions. This study underscores the efficacy and potential of machine learning in mental health assessment. The results provide a robust scientific foundation for the development of early warning systems and targeted intervention strategies to enhance the mental well-being of college students.
References
1. Cheng S, An D, Yao Z. Association between mental health knowledge level and depressive symptoms among Chinese college students. International Journal of Environmental Research and Public Health. 2021; 18(4): 1850. doi: 10.3390/ijerph18041850
2. Costa PT, McCrae RR. Four ways five factors are basic. Personality and Individual Differences. 1992; 13(6): 653–665. doi: 10.1016/0191-8869(92)90236-I
3. Conger RD, Donnellan B. An interactionist perspective on the socioeconomic context of human development. Annual Review of Psychology. 2007; 58: 175–199. doi: 10.1146/annurev.psych.58.110405.085551
4. Cohen S, Wills TA. Stress, social support, and the buffering hypothesis. Psychological Bulletin. 1985; 98(2): 310–357. doi: 10.1037/0033-2909.98.2.310
5. Ng ZJ, Huebner S, Hills KJ. Life satisfaction and academic performance in early adolescents: Evidence for reciprocal association. Journal of School Psychology. 2015; 53(6): 479–491. doi: 10.1016/j.jsp.2015.09.004
6. Sharma A, Blakemore A, Byrne M. Oral health primary preventive interventions for individuals with serious mental illness in low- and middle-income nations: Scoping review. Global Public Health. 2024; 19(1). doi: 10.1080/17441692.2024.2408597
7. Ahmed NN, Bhat TK, Powar S. Stacked ensemble machine learning approach for electroencephalography-based major depressive disorder classification using temporal statistics. Systems Science & Control Engineering. 2024; 12(1). doi: 10.1080/21642583.2024.2427028
8. Derogatis LR, Cleary PA. Factorial invariance across gender for the primary symptom dimensions of the SCL‐90. British Journal of Social and Clinical Psychology. 1977; 16(4): 347–356. doi: 10.1111/j.2044-8260.1977.tb00241.x
9. Hamaideh SH. Stressors and reactions to stressors among university students. International Journal of Social Psychiatry. 2011; 57(1): 69–80. doi: 10.1177/0020764009348442
10. Wu J, Shen H, Shen Y, et al. The influence of family socioeconomic status on college students’ mental health literacy: The chain mediating effect of parenting styles and interpersonal relationships. Frontiers in Psychology. 2024; 15: 1477221. doi: 10.3389/fpsyg.2024.1477221
11. Tavolacci MP, Ladner J, Grigioni S, et al. Prevalence and association of perceived stress, substance use and behavioral addictions: A cross-sectional study among university students in France, 2009–2011. BMC Public Health. 2013; 13: 724. doi: 10.1186/1471-2458-13-724
12. Lucas RE, Diener E, Suh E. Discriminant validity of well-being measures. Journal of Personality and Social Psychology. 1996; 71(3): 616. doi: 10.1037/0022-3514.71.3.616
13. Dehghan P, Alashwal H, Moustafa AA. Applications of machine learning to behavioral sciences: Focus on categorical data. Discoveries in Psychology. 2022; 2: 22. doi: 10.1007/s44202-022-00027-5
14. Xin C, Zakaria LQ. Integrating BERT with CNN and BiLSTM for explainable detection of depression in social media contents. IEEE Access. 2024; 12: 161203–161212. doi: 10.1109/ACCESS.2024.3488081
15. Shatte ABR, Hutchinson DM, Teague SJ. Machine learning in mental health: A scoping review of methods and applications. Psychological Medicine. 2019; 49(9): 1426–1448. doi: 10.1017/S0033291719000151
16. Osman AB, Tabassum F, Patwary MJA, et al. Examining Mental Disorder/Psychological Chaos through Various ML and DL Techniques: A Critical Review. Annals of Emerging Technologies in Computing. 2022; 6(2): 61–71. doi: 10.33166/AETiC.2022.02.005
17. Zheng BHY. The application of machine learning in mental health. Frontiers in Social Sciences. 2022; 11(11): 4814–4818. doi: 10.12677/ASS.2022.1111656
18. Deng X, Li Y, Weng J. Feature selection for text classification: A review. Multimedia Tools and Applications. 2019; 78(4): 3797–3816. doi: 10.1007/s11042-018-6083-5
19. Tani L, Rand D, Veelken C. Evolutionary algorithms for hyperparameter optimization in machine learning for application in high energy physics. European Physical Journal C. 2021; 81: 170. doi: 10.1140/epjc/s10052-021-08950-y
20. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research. 2002; 16: 321–357. doi: 10.1613/jair.953
21. Vincent AM, Jidesh P. An improved hyperparameter optimization framework for AutoML systems using evolutionary algorithms. Scientific Reports. 2023; 13(1): 4737. doi: 10.1038/s41598-023-32027-3
22. Zhang C, Cho S, Vasarhelyi M. Explainable artificial intelligence (XAI) in auditing. International Journal of Accounting Information Systems. 2022; 46: 100572. doi: 10.1016/j.accinf.2022.100572
23. Wang T, Xue C, Zhang Z, et al. Unraveling the distinction between depression and anxiety: A machine learning exploration of causal relationships. Computers in Biology and Medicine. 2024; 174: 108446. doi: 10.1016/j.compbiomed.2024.108446
24. Liao Z, Fan X, Ma W, Shen Y. An Examination of Mental Stress in College Students: Utilizing Intelligent Perception Data and the Mental Stress Scale. Mathematics. 2024; 12(10): 1501. doi: 10.3390/math12101501
25. Tiwari S, Vats S, Bhardwaj B, et al. Enhanced SMOTE strategy for handling imbalanced data in machine learning classification. In: Proceedings of the 2023 International Conference on Research Methodologies in Knowledge Management, Artificial Intelligence and Telecommunication Engineering (RMKMATE); 1–2 November 2023; Chennai, India. doi: 10.1109/rmkmate59243.2023.10369381
26. Aghbalou A, Sabourin A, Portier F. On the bias of K-fold cross-validation with stable learners. Proceedings of Machine Learning Research. 2023; 206: 3775–3794.
27. Dehghani M, Hubálovský Š, Trojovský P. Northern goshawk optimization: A new swarm-based algorithm for solving optimization problems. IEEE Access. 2021; 9: 162059–162080. doi: 10.1109/ACCESS.2021.3133286
28. Fan J, Ma X, Wu L, et al. Light Gradient Boosting Machine: An efficient soft computing model for estimating daily reference evapotranspiration with local and external meteorological data. Agricultural Water Management. 2019; 225: 105758. doi: 10.1016/j.agwat.2019.105758
29. Lundberg SM, Lee SI. A unified approach to interpreting model predictions. In: Proceedings of the 31st International Conference on Neural Information Processing Systems; 4-9 December 2017; Long Beach, CA, US. doi: 10.48550/arXiv.1705.07874
30. Vaishnavi K, Nikhitha Kamath U, Ashwath Rao B, Subba Reddy NV. Predicting Mental Health Illness using Machine Learning Algorithms. Journal of Physics: Conference Series. 2022; 2161(1): 012021. doi: 10.1088/1742-6596/2161/1/012021
31. Cheng JP, Haw SC. Mental Health Problems Prediction Using Machine Learning Techniques. International Journal on Robotics, Automation and Sciences. 2023; 5(2): 59–72. doi: 10.33093/ijoras.2023.5.2.7
Copyright (c) 2025 Author(s)

This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright on all articles published in this journal is retained by the author(s), while the author(s) grant the publisher as the original publisher to publish the article.
Articles published in this journal are licensed under a Creative Commons Attribution 4.0 International, which means they can be shared, adapted and distributed provided that the original published version is cited.