Abstract
A mobile robot can localize itself in a mapped area by finding a recorded image of a visited place that is most similar to the current view, a technique known as Visual Place Recognition (VPR). We focus on VPR with panoramic images in indoor environments and on direct VPR methods in contrast to feature-based methods. In this context, a key challenge of VPR are appearance changes in the environment, e.g. due to variations in illumination, camera tilt, and rearrangement of objects. To improve the quality in these situations, we propose a novel combination of convolutional neural networks (CNNs) for image preprocessing with two algorithmic solutions for VPR, the Visual Compass and MinWarping. Here, the CNN is fused with the algorithmic VPR method such that the training of the neural network includes backpropagation through both parts, which we refer to as a hybrid model. We show that the hybrid Visual Compass substantially improves tilt tolerance, resulting in a versatile model, while hybrid MinWarping is especially robust against illumination changes and object rearrangement. As an adjacent application to VPR, the hybrid MinWarping algorithm can also be used to estimate the relative pose of the query with respect to a previous image. We analyze how the network solution to image processing changes fundamentally to satisfy the unique requirements of each application. We also show that the VPR hybrid models compare favorably for upright images with a competing solution based on sparse local features.