Abstract
Technological advancements have significantly improved cattle farming, particularly in sensor-based activity monitoring for health management, estrus detection, and overall herd supervision. However, such a sensor-based monitoring framework often illustrates several issues, such as high cost, animal discomfort, and susceptibility to false measurement. This study introduces a vision-based cattle activity monitoring approach deployed in a commercial Nestlé dairy farm, specifically one that is estrus-focused, where overhead cameras capture unconstrained herd behavior under variable lighting, occlusions, and crowding. A custom dataset of 2956 Images are collected and then annotated into four fine-grained behaviors-standing, lying, grazing, and estrus-enabling detailed analysis beyond coarse activity categories commonly used in prior livestock monitoring studies. Furthermore, computer vision-based deep learning algorithms are deployed on this dataset to classify the aforementioned classes. A comparative analysis of YOLOv8 and YOLOv9 is provided, which clearly illustrates that YOLOv8-L achieved a mAP of 91.11%, whereas YOLOv9-E achieved a mAP of 90.23%.