Scholar - SciOpen

Data-parallel computing platforms, such as Hadoop and Spark, are deployed in computing clusters for big data analytics. There is a general tendency that multiple users share the same computing cluster. The schedule of multiple jobs becomes a serious challenge. Over a long period in the past, the Shortest-Job-First (SJF) method has been considered as the optimal solution to minimize the average job completion time. However, the SJF method leads to a low system throughput in the case where a small number of short jobs consume a large amount of resources. This factor prolongs the average job completion time. We propose an improved heuristic job scheduling method, called the Densest-Job-Set-First (DJSF) method. The DJSF method schedules jobs by maximizing the number of completed jobs per unit time, aiming to decrease the average Job Completion Time (JCT) and improve the system throughput. We perform extensive simulations based on Google cluster data. Compared with the SJF method, the DJSF method decreases the average JCT by 23.19% and enhances the system throughput by 42.19%. Compared with Tetris, the job packing method improves the job completion efficiency by 55.4%, so that the computing platforms complete more jobs in a short time span.

Open Access Issue

Balance Resource Allocation for Spark Jobs Based on Prediction of the Optimal Resource

Zhiyao Hu, Dongsheng Li, Deke Guo

Tsinghua Science and Technology 2020, 25 (4): 487-497

Published: 13 January 2020

Abstract

PDF (10.1 MB)

Download citation

GB/T 7714-2015

EndNote(RIS)

BibTeX

NoteExpress

Refworks

Collect Collected

Downloads：36

Apache Spark provides a well-known MapReduce computing framework, aiming to fast-process big data analytics in data-parallel manners. With this platform, large input data are divided into data partitions. Each data partition is processed by multiple computation tasks concurrently. Outputs of these computation tasks are transferred among multiple computers via the network. However, such a distributed computing framework suffers from system overheads, inevitably caused by communication and disk I/O operations. System overheads take up a large proportion of the Job Completion Time (JCT). We observed that excessive computational resources incurs considerable system overheads, prolonging the JCT. The over-allocation of individual jobs not only prolongs their own JCTs, but also likely makes other jobs suffer from under-allocation. Thus, the average JCT is suboptimal, too. To address this problem, we propose a prediction model to estimate the changing JCT of a single Spark job. With the support of the prediction method, we designed a heuristic algorithm to balance the resource allocation of multiple Spark jobs, aiming to minimize the average JCT in multiple-job cases. We implemented the prediction model and resource allocation method in ReB, a Resource-Balancer based on Apache Spark. Experimental results showed that ReB significantly outperformed the traditional max-min fairness and shortest-job-optimal methods. The average JCT was decreased by around 10%-30% compared to the existing solutions.

total 2