AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (470.7 KB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

Efficient Location-Aware Data Placement for Data-Intensive Applications in Geo-distributed Scientific Data Centers

Jinghui Zhang( )Jian ChenJunzhou LuoAibo Song
School of Computer Science and Engineering, Southeast University, Nanjing 211189, China.
Show Author Information

Abstract

Recent developments in cloud computing and big data have spurred the emergence of data-intensive applications for which massive scientific datasets are stored in globally distributed scientific data centers that have a high frequency of data access by scientists worldwide. Multiple associated data items distributed in different scientific data centers may be requested for one data processing task, and data placement decisions must respect the storage capacity limits of the scientific data centers. Therefore, the optimization of data access cost in the placement of data items in globally distributed scientific data centers has become an increasingly important goal. Existing data placement approaches for geo-distributed data items are insufficient because they either cannot cope with the cost incurred by the associated data access, or they overlook storage capacity limitations, which are a very practical constraint of scientific data centers. In this paper, inspired by applications in the field of high energy physics, we propose an integer-programming-based data placement model that addresses the above challenges as a Non-deterministic Polynomial-time (NP)-hard problem. In addition we use a Lagrangian relaxation based heuristics algorithm to obtain ideal data placement solutions. Our simulation results demonstrate that our algorithm is effective and significantly reduces overall data access cost.

References

[1]
AMS02, http://www.ams02.org/, 2016.
[4]
Yu B. Y. and Pan J. P., Location-aware associated data placement for geo-distributed data-intensive applications, in Proc. 34th IEEE Conference on Computer Communications, Kowloon, Hong Kong, China, 2015, pp. 603-611.
[5]
LeCun B., Mautor T., Quessette F., and Weisser M. A., Bin packing with fragmentable items: Presentation and approximations, Theoretical Computer Science, vol. 602, pp. 50-59, 2015.
[6]
Fisher M. L., The Lagrangian relaxation method for solving integer programming problems, Management Science, vol. 50, no. 12, pp. 1861-1871, 2004.
[7]
Agarwal S., Dunagan J., Jain N., Saroiu S., and Wolman A., Volley: Automated data placement for geo-distributed cloud services, in Proc. 7th USENIX Symposium on Networked Systems Design and Implementation, San Jose, CA, USA, 2010, pp. 17-32.
[8]
Yu B. Y. and Pan J. P., Sketch-based data placement among geo-distributed datacenters for cloud storages, in Proc. 35th IEEE Conference on Computer Communications, San Francisco, CA, USA, 2016, pp. 1-9.
[9]
Xu H. and Li B., Joint request mapping and response routing for geo-distributed cloud services, in Proc. 32th IEEE Conference on Computer Communications, Turin, Italy, 2013, pp. 854-862.
[10]
Kumar K. A., Quamar A., Deshpande A., and Khuller S., SWORD: Workload-aware data placement and replica selection for cloud data management systems, VLDB Journal, vol. 23, no. 6, pp. 845-870, 2014.
[11]
Quamar A., Kumar K. A., and Deshpande A., SWORD: Scalable workload-aware data placement for transactional workloads, in Proc. 16th International Conference on Extending Database Technology, Genoa, Italy, 2013, pp. 430-441.
[12]
Jiao L., Li J., Du W., and Fu X. M.. Multi-objective data placement for multi-cloud socially aware services, in Proc. 33th IEEE Conference on Computer Communications, Toronto, Canada, 2014, pp. 28-36.
[13]
Jiao L., Li J., Xu T. Y., Du W., and Fu X. M., Optimizing cost for online social networks on geo-distributed clouds, IEEE/ACM Transactions on Networking, vol. 24, no. 1, pp. 99-112, 2016.
[14]
Golab L., Hadjieleftheriou M., Karloff H., and Saha B., Distributed data placement to minimize communication costs via graph partitioning, in Proc. 26th International Conference on Scientific and Statistical Database Management, Aalborg, Denmark, 2014, pp. 20-28.
[15]
Çatalyürek Ü. V., Kaya K., and Uçar B., Integrated data placement and task assignment for scientific workflows in clouds, in Proc. 4th International Workshop on Data Intensive Distributed Computing, 2011, pp. 45-54.
[16]
Zhang J. H., Luo J. Z., and Dong F., Scheduling of scientific workflow in non-dedicated heterogeneous multicluster platform, Journal of Systems and Software, vol. 86, no. 7, pp. 1806-1818, 2013.
[17]
Zhang J. H., Luo J. Z., and Dong F., Scientific workflow scheduling in non-dedicated heterogeneous multicluster with advance reservations, Integrated Computer-Aided Engineering, vol. 22, no. 3, pp. 261-280, 2015.
[18]
Zhang J. H., Wang M. J., Luo J. Z., Dong F., and Zhang J. X., Towards optimized scheduling for data-intensive scientific workflow in multiple datacenter environment, Concurrency and Computation: Practice and Experience, vol. 27, no. 18, pp. 5606-5622, 2015.
[19]
Bodik P., Menache I., Chowdhury M., Mani P., Maltz D. A., and Stoica I., Surviving failures in bandwidth-constrained datacenters, in Proc. Annual Conference of the ACM Special Interest Group on Data Communication, Helsinki, Finland, 2012, pp. 431-442.
Tsinghua Science and Technology
Pages 471-481
Cite this article:
Zhang J, Chen J, Luo J, et al. Efficient Location-Aware Data Placement for Data-Intensive Applications in Geo-distributed Scientific Data Centers. Tsinghua Science and Technology, 2016, 21(5): 471-481. https://doi.org/10.1109/TST.2016.7590316

673

Views

33

Downloads

18

Crossref

N/A

Web of Science

27

Scopus

0

CSCD

Altmetrics

Received: 26 July 2016
Revised: 04 August 2016
Accepted: 22 August 2016
Published: 18 October 2016
© The author(s) 2016
Return