Abstract
Researchers need ground-truth activity annotations to train and evaluate wearable-sensor-based activity recognition models. Often-times, researchers establish ground truth by annotating the video recorded while someone engages in activity wearing sensors. The "gold-standard" video annotation practice requires two trained annotators independently annotating the same footage with a third domain expert resolving disagreements. Such annotation is laborious, and so widely-used datasets have often been annotated using only a single annotator per video. Because the research community is moving towards collecting data of more complex behaviors from free-living people 24/7 and annotating more granular, fleeting activities, the annotation task grows even more challenging; the single-annotator approach may yield inaccuracies. We investigated a "silver-standard" approach: rather than using two independent annotation passes, a second annotator revises the work of the first annotator. The proposed approach reduced the total annotation time by 33% compared to the gold-standard approach, with near-equivalent annotation quality. The silver-standard label was in higher agreement with the gold-standard label than the single-annotator label, with Cohen's κ of 0.77 and 0.68 respectively on a 16.4 h video. The silver-standard labels also had higher inter-rater reliability than the single-annotator labels, with the respective mean Cohen's κ across six videos (92 h of total footage) of 0.79 and 0.68.