Publications: Department of Mathematical and Physical Sciences

Home/MPS/Publications
Shazid, M. A. I., Rahman, MS (2025), β€œOn the Development and Validation of Risk Prediction Models in Clustered Binary Data,” paper presented at International Conference on Applied Statistics and Data Science (ICASDS) 2025, Dhaka, Bangladesh

In clustered or multi-center studies, risk-predictive models are often designed to predict in-hospital mortality, assess variability in hospital performance, and identify underperforming hospitals. Therefore, rigorous development and validation of these models, is essential. Although many studies focus on predictive models for independent binary outcomes, limited research addresses clustered binary outcomes. This study examines the development and validation of risk prediction models for clustered binary outcomes.

For clustered binary data, Generalized Linear Mixed Models (GLMMs), like the random intercept logistic model, are commonly used. Various estimation methods are available, including Penalized Quasi-Likelihood (PQL), Laplace Approximation, and Adaptive Gaussian Quadrature (AGQ). This study compares these methods using simulation studies, concluding that AGQ provides more accurate results.

When predicting outcomes in clusters not included in the training data, current literature suggests setting random intercepts to zero. This ignores clustering effects, leads to poor predictive performance, limits the model's applicability for out-of-sample predictions and hinders model validation. To overcome these, the study proposes two novel approaches for out-of-sample prediction and model validation. The first approach, termed the "offset" method, involves re-fitting the model in new clusters while using the linear predictor from the training data as an offset. The second approach, called the "optim" method, optimizes a prediction accuracy metric (e.g., mean square error, Brier score) over a range of random effect values in a new cluster.

The predictive performance of both methods was evaluated using metrics such as Brier score, concordance statistics, log loss, and misclassification error rate. Simulation studies demonstrated the robustness of these methods across varying cluster sizes, numbers of clusters, and clustering effects. The "optim" method showed slightly lower misclassification errors, whereas the "offset" method performed better on other accuracy metrics. Finally, the proposed methods were applied to develop and validate a risk predictive model for hypertension using data from the Bangladesh Demographic and Health Survey (BDHS) of years 2017–18 and 2022.

The study addresses the in validating predictive models with clustered data. The proposed methods can be easily extended to other models such as clustered Poisson model, and models with more complex random effect structures.

Conference Paper

Saha, S., Haque, M.F., Nahar, S., Kamal, K.M.S., Reza, A.W. (2025). COVID-19 Detection Using VGG16: Enhancing Robustness and Interpretability with Adversarial Attack Analysis and Explainable AI. In: Satu, M.S., Shamsul Arefin, M., Lio', P., Kaiser, M.S. (eds) Proceeding of the 2nd International Conference on Machine Intelligence and Emerging Technologies. MIET 2024. Lecture Notes in Networks and Systems, vol 1235. Springer, Singapore. https://doi.org/10.1007/978-981-96-2721-9_12

Book Chapter

Saha, S., Rana, S. (2025). Explainable AI in Healthcare: Clinical Decision Support and Diagnostic Systems. In Explainable AI in Critical Domains: From Theory to Trusted Applications, Chapter 7. Nova Science Publishers, Inc.