Journal Home > Volume 7 , Issue 1

Consider the geo-localization task of finding the pose of a camera in a large 3D scene from a single image. Most existing CNN-based methods use as input textured images. We aim to experimentally explore whether texture and correlation between nearby images are necessary in a CNN-based solution for the geo-localization task. To do so, we consider lean images, textureless projections of a simple 3D model of a city. They only contain information related to the geometry of the scene viewed (edges, faces, and relative depth). The main contributions of this paper are: (i) todemonstrate the ability of CNNs to recover camera pose using lean images; and (ii) to provide insight into the role of geometry in the CNN learning process.


menu
Abstract
Full text
Outline
About this article

On the role of geometry in geo-localization

Show Author's information Moti Kadosh1( )Yael Moses2Ariel Shamir2
Department of Electrical Engineering, Tel-Aviv University, Tel-Aviv, Israel
The Efi Arazi School of Computer Science, The Interdisciplinary Center Herzliya, Herzliya, Israel

Abstract

Consider the geo-localization task of finding the pose of a camera in a large 3D scene from a single image. Most existing CNN-based methods use as input textured images. We aim to experimentally explore whether texture and correlation between nearby images are necessary in a CNN-based solution for the geo-localization task. To do so, we consider lean images, textureless projections of a simple 3D model of a city. They only contain information related to the geometry of the scene viewed (edges, faces, and relative depth). The main contributions of this paper are: (i) todemonstrate the ability of CNNs to recover camera pose using lean images; and (ii) to provide insight into the role of geometry in the CNN learning process.

Keywords: geometry, geo-localization, CNN-based solutions, synthetic lean images

References(32)

[1]
Se, S.; Lowe, D.; Little, J. Mobile robot localization and mapping with uncertainty using scale-invariant visual landmarks. The International Journal of Robotics Research Vol. 21, No. 8, 735-758, 2002.
[2]
Lowe, D. G. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision Vol. 60, No. 2, 91-110, 2004.
[3]
Li, Y. P.; Snavely, N.; Huttenlocher, D. P. Location recognition using prioritized feature matching. In: Computer Vision - ECCV 2010. Lecture Notes in Computer Science, Vol. 6312. Daniilidis, K.; Maragos, P.; Paragios, N. Eds. Springer Berlin Heidelberg, 791-804, 2010.
[4]
Ramalingam, S.; Bouaziz, S.; Sturm, P. Pose estimation using both points and lines for geo-localization. In: Proceedings of the IEEE International Conference on Robotics and Automation, 4716-4723, 2011.
DOI
[5]
Bansal, M.; Daniilidis, K. Geometric urban geo-localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3978-3985, 2014.
DOI
[6]
Kendall, A.; Grimes, M.; Cipolla, R. PoseNet: A convolutional network for real-time 6-DOF camera relocalization. In: Proceedings of the IEEE International Conference on Computer Vision, 2938-2946, 2015.
DOI
[7]
Walch, F.; Hazirbas, C.; Leal-Taixé, L.; Sattler, T.; Hilsenbeck, S.; Cremers, D. Image-based localization using LSTMs for structured feature correlation. In: Proceedings of the IEEE International Conference on Computer Vision, 627-637, 2017.
DOI
[8]
Melekhov, I.; Ylioinas, J.; Kannala, J.; Rahtu, E. Image-based localization using hourglass networks. arXiv preprint arXiv:1703.07971, 2017.
[9]
Sattler, T.; Torii, A.; Sivic, J.; Pollefeys, M.; Taira, H.; Okutomi, M.; Pajdla, T. Are large-scale 3D models really necessary for accurate visual localization? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6175-6184, 2017.
DOI
[10]
Sivic, J.; Zisserman, A. Video Google: A text retrieval approach to object matching in videos. In: Proceedings 9th IEEE International Conference on Computer Vision, 1470-1477, 2003.
DOI
[11]
Robertsone, D.; Cipolla, R. An Image-based system for urban navigation. In: Proceedings of the British Machine Conference, 84.1-84.10, 2004.
DOI
[12]
Hays, J.; Efros, A. A. IM2GPS: Estimating geographic information from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, 1-8, 2008.
DOI
[13]
Bergamo, A.; Sinha, S. N.; Torresani, L. Leveraging structure from motion to learn discriminative codebooks for scalable landmark classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 763-770, 2013.
DOI
[14]
Zhang, W.; Kosecka, J. Image based localization in urban environments. In: Proceedings of the 3rd International Symposium on 3D Data Processing, Visualization, and Transmission, 33-40, 2006.
DOI
[15]
Nister, D.; Stewenius, H. Scalable recognition with a vocabulary tree. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2161-2168, 2006.
[16]
Schindler, G.; Brown, M.; Szeliski, R. City-scale location recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1-7, 2007.
DOI
[17]
Irschara, A.; Zach, C.; Frahm, J.; Bischof, H. From structure-from-motion point clouds to fast location recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2599-2606, 2009.
DOI
[18]
Sattler, T.; Leibe, B.; Kobbelt, L. Fast image-based localization using direct 2D-to-3D matching. In: Proceedings of the International Conference on Computer Vision, 667-674, 2011.
DOI
[19]
Matei, B. C.; Vander Valk, N.; Zhu, Z.; Cheng, H.; Sawhney, H. S. Image to LIDAR matching for geotagging in urban environments. In: Proceedings of the IEEE Workshop on Applications of Computer Vision, 413-420, 2013.
DOI
[20]
Svärm, L.; Enqvist, O.; Oskarsson, M.; Kahl, F. Accurate localization and pose estimation for large 3D models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 532-539, 2014.
DOI
[21]
Baatz, G.; Saurer, O.; Köser, K.; Pollefeys, M. Large scale visual geo-localization of images in mountainous terrain. In: Computer Vision - ECCV 2012. Lecture Notes in Computer Science, Vol. 7573. Fitzgibbon, A.; Lazebnik, S.; Perona, P.; Sato, Y.; Schmid, C. Eds. Springer Berlin Heidelberg, 517-530, 2012.
DOI
[22]
Svarm, L.; Enqvist, O.; Kahl, F.; Oskarsson, M. City-scale localization for cameras with known vertical direction. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 39, No. 7, 1455-1461, 2017.
[23]
Piasco, N.; Sidibé, D.; Demonceaux, C.; Gouet-Brunet, V. A survey on visual-based localization: On the benefit of heterogeneous data. Pattern Recognition Vol. 74, 90-109, 2018.
[24]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1-9, 2015.
DOI
[25]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770-778, 2016.
DOI
[26]
Kendall, A.; Cipolla, R. Modelling uncertainty in deep learning for camera relocalization. In: Proceedings of the IEEE International Conference on Robotics and Automation, 4762-4769, 2016.
DOI
[27]
Kendall, A.; Cipolla, R. Geometric loss functions for camera pose regression with deep learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6555-6564, 2017.
DOI
[28]
Berlin Partner für Wirtschaft und Technologie GmbH. Berlin 3D city model. 2016. Available at https://www.businesslocationcenter.de/en/WA/B/seite0.jsp.
[29]
Zhang, C.; Bengio, S.; Hardt, M.; Recht, B.; Vinyals, O. Understanding deep learning requires rethinking generalization. arXiv preprint arXiv:1611.03530, 2016.
[30]
Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; Berg, A. C.; Fei-Fei, L. ImageNet large scale visual recognition challenge. International Journal of Computer Vision Vol. 115, No. 3, 211-252, 2015.
[31]
Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 248-255, 2009.
DOI
[32]
OpenStreetMap Wiki contributors. OSM-3D.org.OpenStreetMap Wiki, 2018. Available athttps://wiki.openstreetmap.org/w/index.php?title=OSM-3D.org&oldid=2025859.
Publication history
Copyright
Rights and permissions

Publication history

Received: 18 July 2020
Accepted: 12 September 2020
Published: 07 January 2021
Issue date: March 2021

Copyright

© The Author(s) 2020

Rights and permissions

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduc-tion in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www. editorialmanager.com/cvmj.

Return