Deep Convolution Neural Networks for Camera Relocalization

LSTM-based global relocalization network with homoscedastic loss

Abstract

The motivation behind this thesis is to further investigate different approaches for image-based localization. Similarly to PoseNet, we formulate the problem as pose regression and further improve upon it by introducing quaternion algebra for proper attitude representation. In addition, we combine two recently developed approaches: (1) a multi-task loss function that learns the optimal weighting between position and orientation regression tasks, (2) a CNN followed by a spatial LSTM network for better structured feature correlation. Furthermore, we only finetune a small portion of the pretrained CNN feature extractor. Lastly, we extend the problem to videos and employ seq-to-seq regression model based on LSTMs. We evaluate the models on the 7Scenes dataset and introduce a new Airframe dataset, where localization is performed with respect to an object that changes position and orientation in the environment. We achieve at least competitive, but sometimes outperforming results, while requiring considerably less computational power for training the models.

Publication
Master Thesis
Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.