Abstract
Sports tracking produces large, unstructured trajectory datasets. The search and retrieval of interesting plays are essential parts of their analysis. Since annotations are sparse, similarity search remains the standard technique. It relies on learned lower-dimensional representations for its computational feasibility. Siamese Networks learn dimensionality reduction from pairwise distances. However, complete training datasets are impractical to compute due to their combinatorial nature and the cost of distance calculations. Sub-sampling sacrifices representation quality for speed, leading to less meaningful search results. We propose the novel sampling technique Pairwise Diverse and Uncertain Gradient (PairDUG), which exploits the model's gradient signals to select representative and informative pairs for training. The broad experimental study implements the method for large-scale basketball and American football datasets. The results show that PairDUG at least halves the required compute time while maintaining, or even improving, retrieval quality, and outperforms other baseline methods. Furthermore, our evaluation shows that the selected pairs' gradient signals exhibit greater magnitude, diversity, and stability than those of any other method. This work represents a foundational contribution to pairwise distance learning. Hence, future work transfers the method not only to other sports, such as soccer, but also to complex trajectory datasets outside the sports domain.