Journal Home > Volume 24 , Issue 5

Cloud data centers, such as Amazon EC2, host myriad big data applications using Virtual Machines (VMs). As these applications are communication-intensive, optimizing network transfer between VMs is critical to the performance of these applications and network utilization of data centers. Previous studies have addressed this issue by scheduling network flows with coflow semantics or optimizing VM placement with traffic considerations. However, coflow scheduling and VM placement have been conducted orthogonally. In fact, these two mechanisms are mutually dependent, and optimizing these two complementary degrees of freedom independently turns out to be suboptimal. In this paper, we present VirtCO, a practical framework that jointly schedules coflows and places VMs ahead of VM launch to optimize the overall performance of data center applications. We model the joint coflow scheduling and VM placement optimization problem, and propose effective heuristics for solving it. We further implement VirtCO with OpenStack and deploy it in a testbed environment. Extensive evaluation of real-world traces shows that compared with state-of-the-art solutions, VirtCO greatly reduces the average coflow completion time by up to 36.5%. This new framework is also compatible with and readily deployable within existing data center architectures.


menu
Abstract
Full text
Outline
About this article

VirtCO: Joint Coflow Scheduling and Virtual Machine Placement in Cloud Data Centers

Show Author's information Dian ShenJunzhou LuoFang Dong( )Junxue Zhang
School of Computer Science and Engineering, Southeast University, Nanjing 211189, China.
SING Group, Hong Kong University of Science and Technology, Hong Kong 999077, China.

Abstract

Cloud data centers, such as Amazon EC2, host myriad big data applications using Virtual Machines (VMs). As these applications are communication-intensive, optimizing network transfer between VMs is critical to the performance of these applications and network utilization of data centers. Previous studies have addressed this issue by scheduling network flows with coflow semantics or optimizing VM placement with traffic considerations. However, coflow scheduling and VM placement have been conducted orthogonally. In fact, these two mechanisms are mutually dependent, and optimizing these two complementary degrees of freedom independently turns out to be suboptimal. In this paper, we present VirtCO, a practical framework that jointly schedules coflows and places VMs ahead of VM launch to optimize the overall performance of data center applications. We model the joint coflow scheduling and VM placement optimization problem, and propose effective heuristics for solving it. We further implement VirtCO with OpenStack and deploy it in a testbed environment. Extensive evaluation of real-world traces shows that compared with state-of-the-art solutions, VirtCO greatly reduces the average coflow completion time by up to 36.5%. This new framework is also compatible with and readily deployable within existing data center architectures.

Keywords: cloud computing, data center, coflow scheduling, Virtual Machine (VM) placement

References(31)

[1]
Amazon Elastic Compute Cloud, http://aws.amazon.com/ec2/, 2018.
DOI
[2]
J. C. Mogul and L. Popa, What we talk about when we talk about cloud network performance, in Proceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM’12), Helsinki, Finland, 2012, pp. 44-48.
DOI
[3]
D. Xie, N. Ding, Y. C. Hu, and R. Kompella, The only constant is change: Incorporating time-varying network reservations in data centers, in Proceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM’12), Helsinki, Finland, 2012, pp. 199-210.
DOI
[4]
M. Chowdhury, M. Zaharia, J. Ma, M. I. Jordan, and I. Stoica, Managing data transfers in computer clusters with orchestra, in Proceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM’12), Toronto, Canada, 2011, pp. 98-109.
DOI
[5]
J. Jiang, S. Ma, B. Li, and B. Li, Symbiosis: Network-aware task scheduling in data-parallel frameworks, in Proceedings of IEEE Conference on Computer Communications (INFOCOM’16), San Francisco, CA, USA, 2016, pp. 1-9.
DOI
[6]
M. Chowdhury, Y. Zhong, and I. Stoica, Efficient coflow scheduling with varys, in Proceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM’14), Chicago, IL, USA, 2014, pp. 443-454.
DOI
[7]
J. Dean and S. Ghemawat, Mapreduce: Simplified data processing on large clusters, Communications of the ACM, vol. 51, no. 1, pp. 107-113, 2008.
[8]
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, MJ. Franklin, S. Shenker, and I. Stoica, Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing, in Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI’12), San Jose, CA, USA, 2012, pp. 2-2.
DOI
[9]
G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski, Pregel: A system for large-scale graph processing, in Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’10), Indianapolis, IN, USA, 2010, pp. 135-146.
DOI
[10]
M. Chowdhury and I. Stoica, Efficient coflow scheduling without prior knowledge, in Proceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM’15), London, UK, 2015, pp. 393-406.
DOI
[11]
Z. Qiu, C. Stein, and Y. Zhong, Minimizing the total weighted completion time of coflows in datacenter networks, in Proceedings of the ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’15), Portland, OR, USA, 2015, pp. 294-303.
DOI
[12]
J. Lee, Y. Turner, M. Lee, L. Popa, S. Banerjee, J. Kang, and P. Sharma, Application-driven bandwidth guarantees in datacenters, in Proceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM’14), Chicago, IL, USA, 2014, pp. 467-478.
DOI
[13]
X. Meng, V. Pappas, and L. Zhang, Improving the scalability of data center networks with traffic-aware virtual machine placement, in Proceedings of IEEE Conference on Computer Communications (INFOCOM’10), San Diego, CA, USA, 2010, pp. 1-9.
DOI
[14]
X. Li, J. Wu, S. Tang, and S. Lu, Let’s stay together: Towards traffic aware virtual machine placement in data centers, in Proceedings of IEEE Conference on Computer Communications (INFOCOM’14), Toronto, Canada, 2014, pp. 1842-1850.
DOI
[15]
Y. Zhao, K. Chen, W. Bai, M. Yu, C. Tian, Y. Geng, Y. Yang, D. Li, and S. Wang, Rapier: Integrating routing and scheduling for coflow-aware data center networks, in Proceedings of IEEE Conference on Computer Communications (INFOCOM’15), Hong Kong, China, 2015, pp. 424-432.
DOI
[16]
V. Jalaparti, P. Bodik, I. Menache, S. Rao, K. Makarychev, and M. Caesar, Network-aware scheduling for data-parallel jobs: Plan when you can, in Proceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM’15), London, UK, 2015, pp. 407-420.
DOI
[17]
H. Zhang, L. Chen, B. Yi, K. Chen, M Chowdhury, and Y. Geng, Coda: Toward automatically identifying and scheduling coflows in the dark, in Proceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM’16), Florianopolis, Brazil, 2016, pp. 160-173.
DOI
[18]
K. LaCurts, J. C. Mogul, H. Balakrishnan, and Y. Turner, Cicada: Introducing predictive guarantees for cloud networks, in Proceedings of USENIX Workshop on Hot Topics in Cloud Computing (HotCloud’14), Philadelphia, PA, USA, 2014, pp. 14-19.
[19]
J. Perry, H. Balakrishnan, and D. Shah, Flowtune: Flowlet control for datacenter networks. in Proceedings of USENIX Conference on Networked Systems Design and Implementation (NSDI’17), Boston, MA, USA, 2017, pp. 421-435.
[20]
OpenStack Open Source Cloud Computing Software, https://www.openstack.org/, 2018.
[21]
D. Shen, J. Luo, F. Dong, and J. Zhang, Appbag: Application-aware bandwidth allocation for virtual machines in cloud environment, in 45th International Conference on Parallel Processing (ICPP), Philadelphia, PA, USA, 2016, pp. 21-30.
DOI
[22]
L. Chen, W. Cui, B. Li, and B. Li, Optimizing coflow completion times with utility max-min fairness, in Proceedings of IEEE Conference on Computer Communications (INFOCOM’16), San Francisco, CA, USA, 2016, pp. 1755-1763.
DOI
[23]
Y. Lu, Sed: An SDN-based explicit-deadline-aware TCP for cloud data center networks, Tsinghua Science and Technology, vol. 21, no. 5, pp. 491-499, 2016.
[24]
F. Ahmad, S. T. Chakradhar, A. Raghunathan, and T. N. Vijaykumar, Shufflewatcher: Shuffle-aware scheduling in multi-tenant MapReduce clusters, in Proceedings of USENIX Annual Technical Conference (ATC’14), Philadelphia, PA, USA, 2014, pp. 1-12.
[25]
A. Munir, T. He, R. Raghavendra, F. Li, and A. X. Liu, Network scheduling aware task placement in datacenters, in Proceedings of the International Conference on Emerging Networking Experiments and Technologies (CoNEXT’16), Irvine, CA, USA, 2016, pp. 221-235.
DOI
[26]
Y. Zhao, Y. Huang, K. Chen, M. Yu, S. Wang, and D. S. Li, Joint VM placement and topology optimization for traffic scalability in dynamic datacenter networks, Computer Networks, vol. 80, pp. 109-123, 2015.
[27]
H. Wang, Y. Li, Y. Zhang, and D. Jin, Virtual machine migration planning in software-defined networks, in Proceedings of IEEE Conference on Computer Communications (INFOCOM’15), Hong Kong, China, 2015, pp. 487-495.
DOI
[28]
J. Li, D. Li, Y. Ye, and X. Lu, Efficient multi-tenant virtual machine allocation in cloud data centers, Tsinghua Science and Technology, vol. 20, no. 1, pp. 81-89, 2015.
[29]
K. Ousterhout, R. Rasti, S. Ratnasamy, S. Shenker, and B. G. Chun, Making sense of performance in data analytics frameworks, in Proceedings of USENIX Conference on Networked Systems Design and Implementation (NSDI’15), Oakland, CA, USA, 2015, pp. 293-307.
[30]
A. Trivedi, P. Stuedi, J. Pfefferle, R. Stoica, B. Metzler, I. Koltsidas, and N. Ioannou, On the [ir] relevance of network performance for data processing, in Proceedings of USENIX Workshop on Hot Topics in Cloud Computing (HotCloud’16), Denver, CO, USA, 2016, pp. 126-131.
[31]
J. Zhang, J. Chen, J. Luo, and A. Song, Efficient location-aware data placement for data-intensive applications in geo-distributed scientific data centers, Tsinghua Science and Technology, vol. 21, no. 5, pp. 471-481, 2016.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 02 April 2018
Accepted: 01 May 2018
Published: 29 April 2019
Issue date: October 2019

Copyright

© The author(s) 2019

Acknowledgements

Acknowledgements

This work was supported by the National Key R&D Program of China (No. 2017YFB1003000), the National Natural Science Foundation of China (Nos. 61572129, 61602112, 61502097, 61702096, 61320106007, and 61632008), the International S&T Cooperation Program of China (No. 2015DFA10490), the National Science Foundation of Jiangsu Province (Nos. BK20160695 and BK20170689), the Jiangsu Provincial Key Laboratory of Network and Information Security (No. BM2003201), the Key Laboratory of Computer Network and Information Integration of Ministry of Education of China (No. 93K-9), and partially supported by the Collaborative Innovation Center of Novel Software Technology and Industrialization and Collaborative Innovation Center of Wireless Communications Technology.

Rights and permissions

Return