1
Harran University, Faculty of Agriculture, Department of Animal Science, Şanlıurfa
Abstract
Subclinical mastitis (SCM) imposes substantial economic losses on the global dairy industry. Conventional diagnostics are often ill-suited for rapid, large-scale screening, highlighting the need for novel diagnostic approaches. The objective of this study was to evaluate seven machine learning (ML) and deep learning (DL) models for predicting SCM (Somatic Cell Count (SCC)>200.000 cells mL-1) from routine milk composition data. Using a dataset of 1.391 milk records, we evaluated models based on fat, protein, lactose, total solids (TS), and milk urea nitrogen (MUN) features. The training data was balanced using the Synthetic Minority Over-sampling Technique (SMOTE) to prevent bias towards the majority class. The best model's predictions were interpreted using SHapley Additive exPlanations (SHAP) to identify key predictive factors. The Extreme Gradient Boosting (XGBoost) model delivered the highest performance, achieving 82.3% accuracy and an 87.8% F1-Score on the unaltered test set. Tree-based ensemble models also outperformed DL and simpler classifiers. SHAP analysis identified lactose and protein as the most decisive features; lower lactose and higher protein levels were highly predictive of SCM, which is consistent with established pathophysiology. The results establish that an interpretable model using routine milk data offers a robust, non-invasive, and cost-effective framework for early SCM detection. This provides a valuable decision-support tool for improving udder health management and farm sustainability.
YALÇİN, H. (2025). Interpretable Machine Learning Models for Early Detection of Subclinical Mastitis using Routine Milk Composition Data. ISPEC Journal of Agricultural Sciences, 9(3), 830–843. https://doi.org/10.5281/zenodo.16479542
📄Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., 2016. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. Preprint, (https://arxiv.org/abs/16 03.04467), (Accessed: 09.04.2025).
📄Aldrees, A., Javed, M.F., Taha, A.T.B., Mohamed, A.M., Jasiński, M., Gono, M., 2023. Evolutionary and ensemble machine learning predictive models for evaluation of water quality. Journal of Hydrology: Regional Studies, 46: 101331.
📄Bausewein, M., Mansfeld, R., Doherr, M.G., Harms, J., Sorge, U.S., 2022. Sensitivity and Specificity for the Detection of Clinical Mastitis by Automatic Milking Systems in Bavarian Dairy Herds. Animals, 12:2131–2131.
📄Breiman, L., 2001. Random forests. Machine Learning, 45(1): 5-32.
📄Chen, T., Guestrin, C., 2016. XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Conference Proceedings Book, 13–17 August, San Francisco, CA, USA, p. 785–794.
📄Džermeikaitė, K., Krištolaitytė, J., Antanaitis, R., 2025. Application of machine learning models for the early detection of metritis in dairy cows based on physiological, behavioural and milk quality indicators. Animals, 15(11): 1674.
📄Garcia, D., Martinez, E., Rodriguez, F., 2023. Informatics and dairy industry coalition: Artificial intelligence trends and present challenges. IEEE Industrial Electronics Magazine, 18(2): 30-37.
📄Gicic, A., Đonko, D., Subasi, A., 2024. Time sequence deep learning model for ubiquitous tabular data with unique 3D tensors manipulation. Entropy 26:783–783.
📄Guo, Y., Dai, X., Hu, Z., 2025. Research on the prediction model of mastitis in dairy cows based on time series characteristics. Frontiers in Veterinary Science, 12:1575525.
📄Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., Bing, G., 2016. Learning from class-imbalanced data: Review of methods and applications. Expert Systems with Applications, 73:220–239.
📄Hannon, F.P., Green, M.J., O’Grady, L., Hudson, C., Gouw, A., Randall, L.V., 2025. Predictive models for the implementation of targeted reproductive management in multiparous cows on automatic milking systems. Journal of Dairy Science, 108(2): 1634–1643.
📄Harris, C.R., Millman, K.J., Van Der Walt, S.J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N.J., Kern, R., Picus, M., Hoyer, S., Van Kerkwijk, M.H., Brett, M., Haldane, A., Fernández, J.R., Wiebe, M., Peterson, P., Gérard-Marchant, P., 2020. Array programming with NumPy. Nature, 585: 357–362.
📄Hunter, J.D., 2007. Matplotlib: A 2D graphics environment. Computing in Science and Engineering, 9(3): 90–95.
📄Jia, X., Li, Y., Zhang, H., 2023. Deep learning in livestock disease prediction: A review. Animal Production Science, 63(1): 1-15.
📄Johnson, K., Williams, L., 2022. Human-computer interactions with farm animals—enhancing welfare through precision livestock farming and artificial intelligence. Frontiers in Veterinary Science, 11: 1490851.
📄Kahraman, M., Daş, A., Güngören, G., Keskinbıçak, Y., Yalçin, H., 2024. Şanlıurfa ve çevresinde yetiştirilen süt sığırlarında süt kalite parametrelerinin karşılaştırılması. Veteriner Hekimler Derneği Dergisi, 95(1):21-8.
📄Kahraman, M., Das, A., Gungoren, G., Dogan Das, B., Yalcin, H., Hitit, M., Koyuncu, İ., Akmese, S., 2022a. Metabolomics characteristics associated with milk yield and milk quality in sheep. Journal of the Hellenic Veterinary Medical Society, 73: 4645–4656.
📄Kahraman, M., Sakar, E., Yurtseven, S., Daş, S., Yalçin, H., Avcı, M., Güngören, G., Doğan Daş, B., Şahan, A., Takım, K., Ak, B.E., 2022b. Koyunlarda fıstık kabuğu, nar kabuğu ve zeytin pirinası ile beslemenin süt verimi, süt kalitesi ve bazı kan biyokimyasal parametreleri üzerine etkisi. Harran Üniversitesi Veteriner Fakültesi Dergisi, 11(1): 84–92.
📄Kalkan, A., Tepeli, M., Göde, A., 2025. Mastitis diagnosis with machine learning algorithms. Neural Computing and Applications, 37: 12351–12372.
📄Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Liu, T.Y., 2017. LightGBM: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems, 30: 3146–3154.
📄Khin, M.P., Tin, P., Horii, Y., Zin, T.T., 2024. Predictive modeling of cattle calving time emphasizing abnormal and normal cases by using posture analysis. Scientific Reports, 14(1): 31871.
📄Kiouvrekis, Y., Vasileiou, N.G.C., Katsarou, E.I., Lianou, D.T., Michael, C.K., Zikas, S., Katsafadou, A.I., Bourganou, M.V., Liagka, D.V., Chatzopoulos, D.C., Fthenakis, G.C., 2024. The use of machine learning to predict prevalence of subclinical mastitis in dairy sheep farms. Animals, 14(16): 2295.
📄Krishnamoorthy, S., Singh, R., Singh, A., 2021. Economic losses due to mastitis in dairy animals: A review. Journal of Animal Research, 11(1): 1-8.
📄Lemaître, G., Nogueira, F., Aridas, C.K., 2017. Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning. Journal of Machine Learning Research, 18(17): 1–5.
📄Li, J., Wang, L., Chen, Y., 2021. Application of machine learning in animal husbandry: A review. Journal of Animal Science and Biotechnology, 12(1): 1-10.
📄Liu, X., Chu, M., Li, Q., Si, Y., Liu, G., 2023. Deep learning-based model to classify mastitis in Holstein dairy cows. Biosystems Engineering, 252: 92-104.
📄Lundberg, S.M., Lee, S.I., 2017. A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30: 4765–4774.
📄Luo, Y., Zhang, X., Wang, Y., 2024. Advances in artificial intelligence for agricultural applications. Computers and Electronics in Agriculture, 216: 108375.
📄Ma, X., Li, Z., Wang, L., 2021. A systematic literature review on deep learning applications for precision cattle farming. Computers and Electronics in Agriculture, 187: 106313.
📄Mahato, S., Neethirajan, S., 2024. Integrating Artificial Intelligence in dairy farm management − biometric facial recognition for cows. Information Processing in Agriculture, (In Press, Corrected Proof).
📄McKinney, W., 2010. Data Structures for Statistical Computing in Python. Proceedings of the 9th Python in Science Conference, Conference Proceedings Book, 28 June-3 July, Austin, p. 56-61.
📄Na, M.H., Cho, W., Kang, S., Na, I., 2023. Comparative analysis of statistical regression models for prediction of live weight of Korean cattle during growth. Agriculture, 13(10): 1895.
📄Neves, S.F., Silva, M.C.F., Miranda, J.M., Stilwell, G., Cortez, P.P., 2022. Predictive models of dairy cow thermal state: A review from a technological perspective. Veterinary Sciences, 9(8): 416.
📄Pakrashi, A., Ryan, C., Guéret, C., Berry, D.P., Corcoran, M., Keane, M.T., Namee, B.M., 2023. Early detection of subclinical mastitis in lactating dairy cows using cow-level features. Journal of Dairy Science, 106: 4978-4990.
📄Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Duchesnay, É., 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12: 2825–2830.
📄Raj, M., Kumar, S., Singh, R., 2021. Somatic cell count threshold for detecting subclinical mastitis in crossbred dairy cows. Animal Bioscience, 34(2):45–253.
📄Raj, S., Kumar, A., Singh, S., 2021. Diagnosis of subclinical mastitis in dairy cattle: A review. Veterinary World, 14(1): 200-207.
📄Ranzato, G., Adriaens, I., Lora, I., Aernouts, B., Statham, J., Azzolina, D., Meuwissen, D., Prosepe, I., Zidi, A., Cozzi, G., 2022. Joint models to predict dairy cow survival from sensor data recorded during the first lactation. Animals, 12(24): 3494.
📄Rodriguez, Z., Cabrera, V.E., Hogeveen, H., Ruegg, P.L., 2024. Economic impact of subclinical mastitis treatment in early lactation using intramammary nisin. Journal of Dairy Science, 107(7): 4634-4645.
📄Ruiz-González, A., García-Muñoz, A., Gómez-Fernandez, J., 2024. Monitoring udder health in dairy cows: A review of current practices and future perspectives. Journal of Dairy Science, 107(3): 2001-2015.
📄Satoła, A., Satoła, K., 2024. Performance comparison of machine learning models used for predicting subclinical mastitis in dairy cows: Bagging, boosting, stacking, and super‑learner ensembles versus single machine learning models. Journal of Dairy Science, 107(6): 3959–3972.
📄Shaker, M.H., Hüllermeier, E., 2025. Random Forest Calibration. Preprint, (https://arxiv. org/abs/2501.16756), (Accessed: 16.04.2025).
📄Shine, P., Murphy, M.D., 2021. Over 20 years of machine learning applications on dairy farms: A comprehensive mapping study. Sensors, 22(1): 52.
📄Shwartz-Ziv, R., Armon, A., 2021. Tabular data: Deep learning is not all you need. Information. Fusion, 81:84–90.
📄Slob, N., Catal, C., Kassahun, A., 2021. Application of machine learning to improve dairy farm management: A systematic literature review. Preventive Veterinary Medicine, 187: 105237.
📄Stanek, M., Krupinski, J., Wozniak, M., 2024. The economic impact of mastitis in dairy herds and the need for innovative diagnostic solutions. Journal of Dairy Research, 91(1): 1-9.
📄Waskom, M., 2021. Seaborn: statistical data visualization. Journal of Open Source Software, 6(60): 3021.
📄Witkowska, D., Ponieważ, M., 2022. Economic aspects of mastitis in dairy cattle-A review. Journal of Veterinary Research, 66(2): 145-152.
📄Zecconi, A., Zaghen, F., Meroni, G., Sommariva, F., Ferrari, S., Sora, V., 2025. Machine learning approach for early lactation mastitis diagnosis using total and differential somatic cell counts. Animals, 15(8): 1125.