Open Access

Interpretable Machine Learning Models for Early Detection of Subclinical Mastitis using Routine Milk Composition Data

1 Harran University, Faculty of Agriculture, Department of Animal Science, Şanlıurfa

Abstract

Subclinical mastitis (SCM) imposes substantial economic losses on the global dairy industry. Conventional diagnostics are often ill-suited for rapid, large-scale screening, highlighting the need for novel diagnostic approaches. The objective of this study was to evaluate seven machine learning (ML) and deep learning (DL) models for predicting SCM (Somatic Cell Count (SCC)>200.000 cells mL-1) from routine milk composition data. Using a dataset of 1.391 milk records, we evaluated models based on fat, protein, lactose, total solids (TS), and milk urea nitrogen (MUN) features. The training data was balanced using the Synthetic Minority Over-sampling Technique (SMOTE) to prevent bias towards the majority class. The best model's predictions were interpreted using SHapley Additive exPlanations (SHAP) to identify key predictive factors. The Extreme Gradient Boosting (XGBoost) model delivered the highest performance, achieving 82.3% accuracy and an 87.8% F1-Score on the unaltered test set. Tree-based ensemble models also outperformed DL and simpler classifiers. SHAP analysis identified lactose and protein as the most decisive features; lower lactose and higher protein levels were highly predictive of SCM, which is consistent with established pathophysiology. The results establish that an interpretable model using routine milk data offers a robust, non-invasive, and cost-effective framework for early SCM detection. This provides a valuable decision-support tool for improving udder health management and farm sustainability.

Keywords

How to Cite

YALÇİN, H. (2025). Interpretable Machine Learning Models for Early Detection of Subclinical Mastitis using Routine Milk Composition Data. ISPEC Journal of Agricultural Sciences, 9(3), 830–843. https://doi.org/10.5281/zenodo.16479542

References

📄 Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., 2016. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. Preprint, (https://arxiv.org/abs/16 03.04467), (Accessed: 09.04.2025).
📄 Aldrees, A., Javed, M.F., Taha, A.T.B., Mohamed, A.M., Jasiński, M., Gono, M., 2023. Evolutionary and ensemble machine learning predictive models for evaluation of water quality. Journal of Hydrology: Regional Studies, 46: 101331.
📄 Bausewein, M., Mansfeld, R., Doherr, M.G., Harms, J., Sorge, U.S., 2022. Sensitivity and Specificity for the Detection of Clinical Mastitis by Automatic Milking Systems in Bavarian Dairy Herds. Animals, 12:2131–2131.
📄 Breiman, L., 2001. Random forests. Machine Learning, 45(1): 5-32.
📄 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P., 2002. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16: 321-357.