Abstract
Monocular visual-inertial odometry based on the MSCKF algorithm has demonstrated computational efficiency even with limited resources. Moreover, the MSCKF-VIO is primarily designed for localization tasks, where environmental features such as points, lines, and planes are tracked across consecutive images. These tracked features are subsequently triangulated using the historical IMU/camera poses in the state vector to perform measurement updates. Although feature points can be extracted and tracked using traditional techniques followed by the MSCKF feature point triangulation algorithm, the number of feature points in the image is often insufficient to capture the depth of the entire environment. This limitation arises from traditional feature point extraction and tracking techniques in environments with textureless planes. To address this problem, we propose an algorithm for extracting and tracking pixel points to estimate the depth of each grid in the image, which is segmented into numerous grids. When feature points cannot be extracted from a grid, any arbitrary pixel without features, preferably on the contour, can be selected as a candidate point. The combination of feature-rich and featureless pixel points is initially tracked using traditional techniques such as optical flow. When these traditional methods fail to track a given point, the proposed method utilizes the geometry of triangulated features in adjacent images as a reference for tracking. After successful tracking and triangulation, this approach results in a more detailed depth map of the environment. The proposed method has been implemented within the OpenVINS environment and tested on various open-source datasets supported by OpenVINS to validate the findings. Tracking arbitrary featureless pixel points alongside traditional features ensures a real-time depth map of the surroundings, which can be applied to various applications, including obstacle detection, collision avoidance, and path planning.