Document Type : Research Paper
Authors
1 Department of Animal Science, Faculty of Agriculture, Abhar Branch, Islamic Azad University, Zanjan, Iran. E-mail: ali.rezazadeh@iau.ac.ir.
2 Corresponding Author, Department of Animal Science, Agricultural Research, Education and Extension Organization (AREEO), Agricultural institute of education and extension (IATE), Tehran, Iran. E-mail: arhb@abc.org.ir
3 Department of Animal Science, Faculty of Agriculture, Varamin Branch, Islamic Azad University Tehran, Iran. E-mail: pahlevanafshar.k@abhariau.ac.ir
4 Department of Animal Science, Faculty of Agriculture, Abhar Branch, Islamic Azad University. Zanjan, Iran. E-mail: m.aboozari1357@iau.ac.ir
Abstract
Objective: This study seeks to identify and systematically analyze key factors affecting the longevity of Holstein dairy cows in the herds, using advanced data mining algorithms. Understanding and predicting longevity is vital because it directly impacts productivity and profitability of dairy farm. Longer-lasting cows tend to produce more calves and produce more milk, thereby increasing the overall economic efficiency of dairy operations. Furthermore, extended longevity is associated with reduced replacement costs.
Methods: In recent years, the integration of machine learning techniques into agricultural and livestock management has gained significant momentum. This study uses detailed phenotypic data collected from 37,009 female animals belonging to 664 sires in 82 separate herds, representing a comprehensive dataset spanning a decade. The data includes eight milk production records, along with other relevant variables such as animal age, sire number, shelf life in months, somatic cell count, days in milk, milk production (kg), protein and fat content, calving cycle length, milking frequency, geographic location (province), birth date, calving date, calving interval, herd code, and age at first calving. The data preparation phase included processing and organizing the dataset using Excel 2016, which ensured data quality and consistency. Subsequent data analyses were conducted using R software (version 4.3.3), and using the relevant specialized packages for machine learning and statistical modeling.
Results: The results showed that, the Support Vector Machine has the best accuracy (0.987). Random Forest was the second most efficient algorithm. The accuracy of the Gradient Boosting Machine was slightly lower than that of the Random Forest, but it still showed good performance. The Decision Tree provided the least accuracy among these algorithms. The Decision Tree and Support Vector Machine achieved this performance with fewer input variables compared to Gradient Boosting Machine and Random Forest.
Conclusion: The results showed that none of the algorithms used for survival classification, despite acceptable accuracy, are error-free, but on the other hand, it was shown that the decision tree is simpler and less expensive. The most important features of these methods are the lack of statistical assumptions and requirements that linear regression and interpolation methods require, the lack of normality assumptions, robustness against missing values and the ability to detect complex nonlinear relationships between variables and prediction targets, which makes them suitable for various applications in the livestock industry. Accurate data recording protocols as well as precise algorithm settings are essential for accurate prediction.
Keywords