Journal Home > Volume 27 , Issue 6

A large volume of Remote Sensing (RS) data has been generated with the deployment of satellite technologies. The data facilitate research in ecological monitoring, land management and desertification, etc. The characteristics of RS data (e.g., enormous volume, large single-file size, and demanding requirement of fault tolerance) make the Hadoop Distributed File System (HDFS) an ideal choice for RS data storage as it is efficient, scalable, and equipped with a data replication mechanism for failure resilience. To use RS data, one of the most important techniques is geospatial indexing. However, the large data volume makes it time-consuming to efficiently construct and leverage. Considering that most modern geospatial data centres are equipped with HDFS-based big data processing infrastructures, deploying multiple geospatial indices becomes natural to optimise the efficacy. Moreover, because of the reliability introduced by high-quality hardware and the infrequently modified property of the RS data, the use of multi-indexing will not cause large overhead. Therefore, we design a framework called Multi-IndeXing-RS (MIX-RS) that unifies the multi-indexing mechanism on top of the HDFS with data replication enabled for both fault tolerance and geospatial indexing efficiency. Given the fault tolerance provided by the HDFS, RS data are structurally stored inside for faster geospatial indexing. Additionally, multi-indexing enhances efficiency. The proposed technique naturally sits on top of the HDFS to form a holistic framework without incurring severe overhead or sophisticated system implementation efforts. The MIX-RS framework is implemented and evaluated using real remote sensing data provided by the Chinese Academy of Sciences, demonstrating excellent geospatial indexing performance.


menu
Abstract
Full text
Outline
About this article

MIX-RS: A Multi-Indexing System Based on HDFS for Remote Sensing Data Storage

Show Author's information Jiashu WuJingpan XiongHao DaiYang Wang( )Chengzhong Xu
Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
University of Chinese Academy of Sciences, Beijing 100049, China
Guangdong-HongKong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
Faculty of Science and Technology, University of Macau, Macau 999078, China
Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China

Abstract

A large volume of Remote Sensing (RS) data has been generated with the deployment of satellite technologies. The data facilitate research in ecological monitoring, land management and desertification, etc. The characteristics of RS data (e.g., enormous volume, large single-file size, and demanding requirement of fault tolerance) make the Hadoop Distributed File System (HDFS) an ideal choice for RS data storage as it is efficient, scalable, and equipped with a data replication mechanism for failure resilience. To use RS data, one of the most important techniques is geospatial indexing. However, the large data volume makes it time-consuming to efficiently construct and leverage. Considering that most modern geospatial data centres are equipped with HDFS-based big data processing infrastructures, deploying multiple geospatial indices becomes natural to optimise the efficacy. Moreover, because of the reliability introduced by high-quality hardware and the infrequently modified property of the RS data, the use of multi-indexing will not cause large overhead. Therefore, we design a framework called Multi-IndeXing-RS (MIX-RS) that unifies the multi-indexing mechanism on top of the HDFS with data replication enabled for both fault tolerance and geospatial indexing efficiency. Given the fault tolerance provided by the HDFS, RS data are structurally stored inside for faster geospatial indexing. Additionally, multi-indexing enhances efficiency. The proposed technique naturally sits on top of the HDFS to form a holistic framework without incurring severe overhead or sophisticated system implementation efforts. The MIX-RS framework is implemented and evaluated using real remote sensing data provided by the Chinese Academy of Sciences, demonstrating excellent geospatial indexing performance.

Keywords: Remote Sensing (RS) data, geospatial indexing, multi-indexing mechanism, Hadoop Distributed File System (HDFS), Multi-IndeXing-RS (MIX-RS)

References(41)

[1]
The National Aeronautics and Space Administration, https://www.nasa.gov/, 2021.
[2]
European Space Agency, https://www.esa.int/, 2021.
[3]
LandSat Science, Landsat 8 overview, https://landsat.gsfc.nasa.gov/landsat-8, 2021.
DOI
[4]
J. W. Wang, X. Huang, J. Y. Zheng, C. Rajapakshe, S. Kay, L. Kandoor, T. Maxwell, and Z. B. Zhang, Scalable aggregation service for satellite remote sensing data, in Proc. 20th Int. Conf. Algorithms and Architectures for Parallel Processing, New York, NY, USA, 2020, pp. 184–199.
[5]
Y. B. Huang, Z. X. Chen, T. Yu, X. Z. Huang, and X. F. Gu, Agricultural remote sensing big data: Management and applications, J. Integrat. Agric., vol. 17, no. 9, pp. 1915–1931, 2018.
[6]
D. M. Huang, X. N. Liu, B. M. Song, J. Chen, S. Masae, Y. S. Wang, T. Shigeo, H. Yoshimichi, and Y. Yasuo, Vegetation spatial heterogeneity of different soil regions in Inner Mongolia, China, Tsinghua Science and Technology, vol. 12, no. 4, pp. 413–423, 2007.
[7]
D. M. Huang, Y. S. Wang, S. Masae, X. N. Liu, B. M. Song, J. Chen, T. Shigeo, H. Yoshimichi, and Y. Yasuo, Spatial heterogeneity of vegetation in China, Tsinghua Science and Technology, vol. 12, no. 4, pp. 424–434, 2007.
[8]
J. Y. Liang and D. S. Liu, Estimating daily inundation probability using remote sensing, riverine flood, and storm surge models: A case of hurricane harvey, Remote Sens., vol. 12, no. 9, p. 1495, 2020.
[9]
M. Chen, S. W. Mao, and Y. H. Liu, Big data: A survey, Mobile Netw. Appl., vol. 19, no. 2, pp. 171–209, 2014.
[10]
M. Li, J. S. Wu, J. B. Dai, Q. S. Jiang, Q. Qu, X. L. Huang, and Y. Wang, A self-contained and self-explanatory DNA storage system, Sci. Rep., vol. 11, p. 18063, 2021.
[11]
Haut J. M., Paoletti M. E., Moreno-Álvarez S., Plaza J., Rico-Gallego J. A., and Plaza A., Distributed deep learning for remote sensing data interpretation, Proc. IEEE, vol. 109, no. 8, pp. 13201349, 2021.10.1109/JPROC.2021.3063258
[12]
M. S. Warren, S. P. Brumby, S. W. Skillman, T. Kelton, B. Wohlberg, M. Mathis, R. Chartrand, R. Keisler, and M. Johnson, Seeing the earth in the cloud: Processing one petabyte of satellite imagery in one day, in Proc. of the 2015 IEEE Applied Imagery Pattern Recognition Workshop (AIPR), Washington, DC, USA, 2015, pp. 1–12.
[13]
L. H. Li, W. P. Jing, and N. H. Wang, An improved distributed storage model of remote sensing images based on the HDFS and pyramid structure, Int. J. Comput. Appl. Technol., vol. 59, no. 2, pp. 142–151, 2019.
[14]
B. E. B. Semlali and C. El Amrani, Big data and remote sensing: A new software of ingestion, Int. J. Electr. Comput. Eng., vol. 11, no. 2, pp. 1521–1530, 2021.
[15]
Z. C. Xing and G. M. Li, Intelligent classification method of remote sensing image based on big data in spark environment, Int. J. Wirel. Inf. Netw., vol. 26, no. 3, pp. 183–192, 2019.
[16]
P. Y. Wang, J. Q. Wang, Y. Chen, and G. Y. Ni, Rapid processing of remote sensing images based on cloud computing, Future Gener. Comput. Syst., vol. 29, no. 8, pp. 1963–1968, 2013.
[17]
A. K. Karun and K. Chitharanjan, A review on hadoop-HDFS infrastructure extensions, in Proc. of the 2013 IEEE Conf. Information & Communication Technologies, Thuckalay, India, 2013, pp. 132–137.
[18]
A. Eldawy and M. F. Mokbel, SpatialHadoop: A MapReduce framework for spatial data, in Proc. of the 2015 IEEE 31st Int. Conf. Data Engineering, Seoul, Republic of Korea, 2015, pp. 1352–1363.
[19]
A. Eldawy, Y. Li, M. F. Mokbel, and R. Janardan, Cg_Hadoop: Computational geometry in MapReduce, in Proc. 21st ACM SIGSPATIAL Int. Conf. Advances in Geographic Information Systems, Orlando, FL, USA, 2013, pp. 294–303.
[20]
K. M. Al Naami, S. Seker, and L. Khan, GISQF: An efficient spatial query processing system, in Proc. of the 2014 IEEE 7th Int. Conf. Cloud Computing, Anchorage, AK, USA, 2014, pp. 681–688.
[21]
A. Eldawy, M. F. Mokbel, S. Alharthi, A. Alzaidy, K. Tarek, and S. Ghani, SHAHED: A MapReduce-based system for querying and visualizing spatio-temporal satellite data, in Proc. of the 2015 IEEE 31st Int. Conf. Data Engineering, Seoul, Republic of Korea, 2015, pp. 1585–1596.
[22]
M. W. Ding, L. Zheng, Y. C. Lu, L. Li, S. Guo, and M. Y. Guo, More convenient more overhead: The performance evaluation of Hadoop streaming, in Proc. 2011 ACM Symp. Research in Applied Computation, Miami, FL, USA, 2011, pp. 307–313.
[23]
X. F. Lü, C. Q. Cheng, J. Y. Gong, and L. Guan, Review of data storage and management technologies for massive remote sensing data, Sci. China Technol. Sci., vol. 54, no. 12, pp. 3220–3232, 2011.
[24]
A. Fox, C. Eichelberger, J. Hughes, and S. Lyon, Spatio-temporal indexing in non-relational distributed databases, in Proc. of the 2013 IEEE Int. Conf. Big Data, Silicon Valley, CA, USA, 2013, pp. 291–299.
[25]
I. S. Suwardi, D. Dharma, D. P. Satya, and D. P. Lestari, Geohash index based spatial data model for corporate, in Proc. of the 2015 Int. Conf. Electrical Engineering and Informatics (ICEEI), Denpasar, Indonesia, 2015, pp. 478–483.
[26]
K. Y. Huang, G. Q. Li, and J. Wang, Rapid retrieval strategy for massive remote sensing metadata based on GeoHash coding, Remote Sens. Lett., vol. 9, no. 11, pp. 1070–1078, 2018.
[27]
J. J. Liu, H. R. Li, Y. Gao, H. Yu, and D. Jiang, A GeoHash-based index for spatial data management in distributed memory, in Proc. of the 2014 22nd Int. Conf. Geoinformatics, Kaohsiung, China, 2014, pp. 1–4.
[28]
R. T. Whitman, M. B. Park, S. M. Ambrose, and E. G. Hoel, Spatial indexing and analytics on Hadoop, in Proc. 22nd ACM SIGSPATIAL Int. Conf. Advances in Geographic Information Systems, Dallas, TX, USA, 2014, pp. 73–82.
[29]
C. Xu, X. P. Du, Z. Z. Yan, and X. T. Fan, ScienceEarth: A big data platform for remote sensing data processing, Remote Sens., vol. 12, no. 4, p. 607, 2020.
[30]
P. Petrov, P. Dimitrov, and S. Petrova, GEOHASH-EAS—A modified geohash geocoding system with equal-area spaces, in Proc. of the 18th Int. Multidisciplinary Scientific GeoConference SGEM2018, Bulgaria, Russia, 2018, pp. 187–194.
[31]
N. Guo, W. Xiong, Y. Wu, L. Chen, and N. Jing, A geographic meshing and coding method based on adaptive Hilbert-Geohash, IEEE Access, vol. 7, pp. 39815–39825, 2019.
[32]
V. Mithal, A. Khandelwal, S. Boriah, K. Steinhaeuser, and V. Kumar, Change detection from temporal sequences of class labels: Application to land cover change mapping, in Proc. 2013 SIAM Int. Conf. Data Mining, Austin, TX, USA, 2013, pp. 650–658.
[33]
J. H. Faghmous, M. Le, M. Uluyol, V. Kumar, and S. Chatterjee, A parameter-free spatio-temporal pattern mining model to catalog global ocean dynamics, in Proc. of the 2013 IEEE 13th Int. Conf. Data Mining, Dallas, TX, USA, 2013, pp. 151–160.
[34]
T. Yu, N. Chawla, and S. Simoff, Computational Intelligent Data Analysis for Sustainable Development. New York, NY, USA: CRC Press, 2013.
[35]
W. W. Jiang and L. Zhang, Geospatial data to images: A deep-learning framework for traffic forecasting, Tsinghua Science and Technology, vol. 24, no. 1, pp. 52–64, 2019.
[36]
Z. Y. Zhang, X. N. Tong, K. T. McDonnell, A. Zelenyuk, D. Imre, and K. Mueller, An interactive visual analytics framework for multi-field data in a geo-spatial context, Tsinghua Science and Technology, vol. 18, no. 2, pp. 111–124, 2013.
[37]
S. Li, B. H. Xie, J. S. Wu, Y. Zhao, C. H. Liu, and Z. M. Ding, Simultaneous semantic alignment network for heterogeneous domain adaptation, in Proc. 28th ACM Int. Conf. Multimedia, Seattle, WA, USA, 2020, pp. 3866–3874.
[38]
RCEECA CAS, Central Asian Ecology and Environment Research Center of Chinese Academy of Sciences, http://www.egi.cas.cn/yjpt/zgkxyzystyhjyjzx_163317/, 2021.
[39]
A. Aji, F. S. Wang, H. Vo, R. Lee, Q. L. Liu, X. D. Zhang, and J. Saltz, Hadoop GIS: A high performance spatial data warehousing system over MapReduce, Proc. VLDB Endow., vol. 6, no. 11, pp. 1009–1020, 2013.
[40]
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms, 3rd ed. Cambridge, MA, USA: MIT Press, 2009.
[41]
T. Zhang, L. H. Yang, D. H. Shen, and Y. L. Fan, An efficient in-memory R-tree construction scheme for spatio-temporal data stream, in Proc. of the ADMS, ASOCA, ISYyCC, CloTS, DDBS, and NLS4IoT, Hangzhou, China, 2019, pp. 253–265.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 24 May 2021
Revised: 26 August 2021
Accepted: 06 October 2021
Published: 21 June 2022
Issue date: December 2022

Copyright

© The author(s) 2022.

Acknowledgements

This work was supported in part by Key-Area Research and Development Program of Guangdong Province (No. 2020B010164002) and the Fundamental Research Foundation of Shenzhen Technology and Innovation Council (No. KCXFZ20201221173613035).

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return