Machine Learning for Diabetes Screening and Follow-Up Care in Urban Emergency Departments
- Daniel Neill, Ph.D., Professor, Courant Institute Department of Computer Science, Robert F. Wagner Graduate School of Public Service, & Center for Urban Science + Progress, Machine Learning for Good Laboratory
MENTOR:
- David C. Lee, M.D., Associate Professor, Ronald O. Perelman Department of Emergency Medicine, Department of Population Health, NYU Langone Health
Authors
Siyu Miao, Sizhe Pei, Colin Qu, Zhenyu Shi
Research Question
How can the likelihood of follow-up care among newly diagnosed patients be predicted?
Background
Diabetes is a growing public health challenge, particularly in urban areas with strained healthcare systems. Emergency departments are often the first point of contact for diabetic patients or those at risk. Ensuring these patients visit outpatients for follow-up care remains a significant hurdle.
Methodology
This project combined clinical, demographic, and socioeconomic data to create an analytical foundation. Data cleaning (outlier removal, standardization of numerical variables, and reorganization of variables) was conducted to ensure data reliability before modeling. Lasso Regression was then employed to identify the most relevant features influencing follow-up behavior, which were then used in the predictive models. Due to the problem’s binary nature, classification models were developed via logistic regression, random forest, support vector machine, XGBoost, and neural networks. A confusion matrix was used to evaluate and compare model performance, assessing key metrics such as precision, recall, and F1-score. This approach ensures a robust evaluation of each model’s ability to predict follow-up outcomes and provide actionable insights.
Deliverables
- Data Visualization with charts and confusion matrices
- Technical Report
- ArcGIS StoryMap