Journal Home > Volume 22 , Issue 2

With cloud computing technology becoming more mature, it is essential to combine the big data processing tool Hadoop with the Infrastructure as a Service (IaaS) cloud platform. In this study, we first propose a new Dynamic Hadoop Cluster on IaaS (DHCI) architecture, which includes four key modules: monitoring, scheduling, Virtual Machine (VM) management, and VM migration modules. The load of both physical hosts and VMs is collected by the monitoring module and can be used to design resource scheduling and data locality solutions. Second, we present a simple load feedback-based resource scheduling scheme. The resource allocation can be avoided on overburdened physical hosts or the strong scalability of virtual cluster can be achieved by fluctuating the number of VMs. To improve the flexibility, we adopt the separated deployment of the computation and storage VMs in the DHCI architecture, which negatively impacts the data locality. Third, we reuse the method of VM migration and propose a dynamic migration-based data locality scheme using parallel computing entropy. We migrate the computation nodes to different host(s) or rack(s) where the corresponding storage nodes are deployed to satisfy the requirement of data locality. We evaluate our solutions in a realistic scenario based on OpenStack. Substantial experimental results demonstrate the effectiveness of our solutions that contribute to balance the workload and performance improvement, even under heavy-loaded cloud system conditions.


menu
Abstract
Full text
Outline
About this article

Load Feedback-Based Resource Scheduling and Dynamic Migration-Based Data Locality for Virtual Hadoop Clusters in OpenStack-Based Clouds

Show Author's information Dan Tao( )Zhaowen LinBingxu Wang
School of Electronic and Information Engineering, Beijing Jiaotong University, Beijing 100044
Jiangsu High Technology Research Key Laboratory for Wireless Sensor Networks, Nanjing 210003, China.
Network and Information Center, Institute of Network Technology, Science and Technology on Information Transmission and Dissemination in Communication Networks Laboratory, National Engineering Laboratory for Mobile Network Security, Beijing University of Posts and Telecommunications, Beijing 100876, China.

Abstract

With cloud computing technology becoming more mature, it is essential to combine the big data processing tool Hadoop with the Infrastructure as a Service (IaaS) cloud platform. In this study, we first propose a new Dynamic Hadoop Cluster on IaaS (DHCI) architecture, which includes four key modules: monitoring, scheduling, Virtual Machine (VM) management, and VM migration modules. The load of both physical hosts and VMs is collected by the monitoring module and can be used to design resource scheduling and data locality solutions. Second, we present a simple load feedback-based resource scheduling scheme. The resource allocation can be avoided on overburdened physical hosts or the strong scalability of virtual cluster can be achieved by fluctuating the number of VMs. To improve the flexibility, we adopt the separated deployment of the computation and storage VMs in the DHCI architecture, which negatively impacts the data locality. Third, we reuse the method of VM migration and propose a dynamic migration-based data locality scheme using parallel computing entropy. We migrate the computation nodes to different host(s) or rack(s) where the corresponding storage nodes are deployed to satisfy the requirement of data locality. We evaluate our solutions in a realistic scenario based on OpenStack. Substantial experimental results demonstrate the effectiveness of our solutions that contribute to balance the workload and performance improvement, even under heavy-loaded cloud system conditions.

Keywords: Hadoop, resource scheduling, data locality, Infrastructure as a Service (Iaas), OpenStack

References(20)

[1]
Wang L. Z., Tao J., Marten H., Streit A., Khan S. U., Kolodziej J., and Chen D., MapReduce across distributed clusters for data-intensive applications, in IEEE 26th International Parallel and Distributed Processing Symposium Workshops and PhD Forum, Shanghai, China, 2012, pp. 2004-2011.
DOI
[2]
Armbrust M., Fox A., Griffith R., and Zaharia M., A view of cloud computing, Communications of the ACM, vol. 53, no. 4, pp. 50-58, 2010.
[3]
Kang H., Chen Y., Wong J. L., Sion R., and Wu J., Enhancement of Xen’s scheduler for MapReduce workloads, in Proceedings of the 20th International ACM Conference on High Performance Distributed Computing, New York, NY, USA, 2011, pp. 251-262.
DOI
[4]
Sandholm T. and Lai K., Dynamic proportional share scheduling in Hadoop, in Proceedings of the 15th International Conference on Job Scheduling Strategies for Parallel Processing, Springer-Verlag, 2010, pp. 110-131.
DOI
[5]
Sharma B., Prabhakar R., Lim S. H., Kandemir M. T., and Das C. R., MROrchestrator: A fine-grained resource orchestration framework for MapReduce clusters, in IEEE 5th International Conference on Cloud Computing, Hawaii, HI, USA, 2012, pp. 1-8.
DOI
[6]
Lama P. and Zhou X., AROMA: Automated resource allocation and configuration of MapReduce environment in the cloud, in Proceedings of the 9th International Conference on Autonomic Computing, 2012.
DOI
[7]
Zuo L. Y., Cao Z. B., and Dong S. B., Virtual resource evaluation model based on entropy optimized and dynamic weighted in cloud computing, (in Chinese), Journal of Software, vol. 24, no. 8, pp. 1937-1946, 2013.
[8]
Liu Q., Cai W. D., Shen J., Fu Z. J., Liu X. D., and Linge N., A speculative approach to spatial-temporal efficiency with multi-objective optimization in a heterogeneous cloud environment, Security and Communication Networks, vol. 9, no. 17, pp. 4002-4012, 2016.
[9]
Zaharia M., Borthakur D., Sarma J., Elmeleegy K., Shenkeret S., and Stoica I., Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling, in Proceedings of the 5th European Conference on Computer Systems, ACM, 2010, pp. 265-278.
DOI
[10]
Jin H., Yang X., Sun X. H., and Raicu I., ADAPT: Availability-aware MapReduce data placement for non-dedicated distributed computing, in IEEE International Conference on Distributed Computing Systems, Macau, China, 2012, pp. 516-525.
DOI
[11]
Thaha A. F., Singh M., Amin A. H. M., Ahmad N. M., and Kannan S., Hadoop in OpenStack: Data-location-aware cluster provisioning, in IEEE the 4th World Congress on Information and Communication Technologies, 2014, pp. 296-301.
DOI
[12]
Fadika Z. and Govindaraju M., DELMA: Dynamically elastic MapReduce framework for CPU-intensive applications, in The 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, 2011, pp. 454-463.
DOI
[13]
Kong Y., Zhang M. J., and Ye D. Y., A belief propagation-based method for task allocation in open and dynamic cloud environments, Knowledge-based Systems, vol. 115, pp. 123-132, 2016.
[14]
Cheng C. L., Li J., and Wang Y., An energy-saving task scheduling strategy based on vacation queuing theory in cloud computing, Tsinghua Science and Technology, vol. 20, no. 1, pp. 28-39, 2015.
[15]
Sun R. Q., Yang J. Y., Gao Z., and He Z. Q., A virtual machine based task scheduling approach to improve data locality for virtualized Hadoop, in IEEE/ACIS 13th International Conference on Computer and Information Science, 2014, pp. 297-302.
DOI
[16]
Sun R. Q., Yang J., Gao A., and He Z. Q., A resource scheduling approach to improving data locality for virtualized Hadoop cluster, (in Chinese), Journal of Computer Research and Development, vol. 51, no. Suppl., pp. 189-198, 2014.
[17]
Bu X. P., Rao J., and Xu C. Z., Interference and locality-aware task scheduling for MapReduce applications in virtual clusters, in Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, New York, NY, USA, 2013, pp. 227-238.
[18]
Zhang Q., Liu L., Ren Y., Lee K., Tang Y. Z., Zhao X., and Zhou Y., Residency aware inter-VM communication in virtualized cloud: Performance measurement and analysis, in Proc of the 6th IEEE International Conference on Cloud Computing, 2013, pp. 204-211.
DOI
[19]
Fu Z. J., Sun X. M., Liu Q., Zhou L., and Shu J. G., Achieving efficient cloud search services multi-keyword eanked aearch over encrypted cloud data supporting parallel computing, IEICE Transactions on Communications, vol. E98-B, no. 1, pp. 190-200, 2015.
[20]
Sun H. Y., Xie W. X., Yang X., and Lu K., A load balancing algorithm based OH parallel computing entropy in HPC, (in Chinese), Journal of Shenzhen University Science and Engineering, vol. 24, no. 1, pp. 64-68, 2007.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 01 October 2016
Revised: 26 December 2016
Accepted: 27 December 2016
Published: 06 April 2017
Issue date: April 2017

Copyright

© The author(s) 2017

Acknowledgements

This work was supported by the Open Project Program of Jiangsu High Technology Research Key Laboratory for Wireless Sensor Networks (No. WSNLBKF201503), the Fundamental Research Funds for the Central Universities (No. 2016JBM011), Fundamental Research Funds for the Central Universities (No. 2014ZD03-03), the Priority Academic Program Development of Jiangsu Higher Education Institutions, and Jiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology.

Rights and permissions

Return