Abstract
Detection and tracking of animals is an important first step for automated behavioral studies using videos. Animal tracking is currently done mostly using deep learning frameworks based on keypoints, which show remarkable results in lab settings with fixed cameras, backgrounds, and lighting. However, multi-animal tracking in the wild presents several challenges such as high variability in background and lighting conditions, complex motion, and occlusion. We propose PriMAT, an approach for tracking nonhuman primates in the wild. PriMAT learns to detect and track primates and other objects of interest from labeled videos or single images using bounding boxes instead of keypoints. Using bounding boxes significantly facilitates data annotation and robustness. Our one-stage model is conceptually simple but highly flexible, and we add a classification branch that allows us to train individual identification. To evaluate the performance of our approach, we applied it in two case studies with Assamese macaques (Macaca assamensis) and redfronted lemurs (Eulemur rufifrons) in the wild. Additionally, we show transfer to other settings and species, particularly, Barbary macaques (Macaca sylvanus), Guinea baboons (Papio papio), chimpanzees (Pan troglodytes), and gorillas (Gorilla spp.). We show that with only a few hundred frames labeled with bounding boxes, we can achieve robust tracking results. Combining these results with the classification branch for the lemur videos, the lemur identification model shows an accuracy of 84% in predicting identities. Our approach presents a promising solution for accurately tracking and identifying animals in the wild, offering researchers a tool to study animal behavior in their natural habitats. Our code, models, training images, and evaluation video sequences are publicly available at https://github.com/ecker-lab/PriMAT-tracking, facilitating their use for animal behavior analyses and future research in this field.