Abstract
BACKGROUND: Machine learning methods are widely used to detect behavioural data patterns. Although these new mathematical methods are useful tools, the interpretation of the results are often ambivalent unless biologically relevant parameters are included in the analyses. In case of classical (non-neural) machine learning (ML) methods, a crucial first step in time series data analysis is to determine the window length for which the features are computed as input variables for the ML training phase. The bout length of behaviours could be a relevant parameter to determine the window length used by the machine learning methods. METHODS: In this research the movements of dogs were observed. Eight behaviours were defined and motion data was collected using a smartwatch attached to the collar of the dogs. The behaviour sequences of 56 freely moving dogs of various breeds were analysed by using a specific software (SensDog by CEM Inc.). Behaviour recognition was based on binary classification evaluated with a Light Gradient Boosted Machine (LGBM) learning algorithm. For signal processing, sliding window technique was used to find the best window size for the analysis of each behavior. RESULTS: Results showed that for all behaviours, the best recognition was obtained when the window size corresponded to the median bout length of that particular behaviour. CONCLUSIONS: In summary, the most effective strategy to significantly improve the accuracy of behaviour recognition is to use behaviour-specific parameters in the binary classification models. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12917-026-05294-1.