Development of a Performance Monitoring Instrument for Rating Explosives Search Dog Performance

开发用于评估爆炸物搜寻犬性能的性能监测工具

阅读:1

Abstract

The growing body of working dog literature includes many examples of scales robustly developed to measure aspects of dog behavior. However, when comparing behavior to working dog ability, most studies rely on training organizations' own long-established ratings of performance, or simply pass/fail at selection or certification as measures of success. Working ability is multifaceted, and it is likely that different aspects of ability are differentially affected by external factors. In order to understand how specific aspects of selection, training, and operations influence a dog's working ability, numerous facets of performance should be considered. An accurate and validated method for quantifying multiple aspects of performance is therefore required. Here, we describe the first stages of formulating a meaningful performance measurement tool for two types of working search dogs. The systematic methodology used was: (1) interviews and workshops with a representative cross-section of stakeholders to produce a shortlist of behaviors integral to current operational performance of vehicle (VS) and high assurance (HAS) search dogs; (2) assessing the reliability and construct validity of the shortlisted behavioral measures (at the behavior and the individual rater level) using ratings of diverse videoed searches by experienced personnel; and (3) selecting the most essential and meaningful behaviors based on their reliability/validity and importance. The resulting performance measurement tool was composed of 12 shortlisted behaviors, most of which proved reliable and valid when assessed by a group of raters. At the individual rater level, however, there was variability between raters in the ability to use and interpret behavioral measures, in particular, more abstract behaviors such as Independence. This illustrates the importance of examining individual rater scores rather than extrapolating from group consensus (as is often done), especially when designing a tool that will ultimately be used by single raters. For ratings to be practically valuable, individual rater reliability needs to be improved, especially for behaviors deemed as essential (e.g., control and confidence). We suggest that the next steps are to investigate why individuals vary in their ratings and to undertake efforts to increase the likelihood that they reach a common conceptualization of each behavioral construct. Plausible approaches are improving the format in which behaviors are presented, e.g., by adding benchmarks and utilizing rater training.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。