References(33)
[2]
J. Ekanayake, H. Li, B. J. Zhang, T. Gunarathne, S. H. Bae, J. Qiu, and G. Fox, Twister: A runtime for iterative mapreduce, in Proc. 19th ACM Int. Symp. on High Performance Distributed Computing, Chicago, IL, USA, 2010, pp. 810-818.
[3]
Y. Y. Bu, B. Howe, M. Balazinska, and M. D. Ernst, HaLoop: Efficient iterative data processing on large clusters, Proc. VLDB Endowm., vol. 3, nos. 1&2, pp. 285-296, 2010.
[4]
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauly, M. J. Franklin, S. Shenker, and I. Stoica, Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing, in Proc. 9th USENIX Conf. on Networked Systems Design and Implementation, Berkeley, CA, USA, 2012, pp. 15-28.
[5]
F. Yang, J. F. Li, and J. Cheng, Husky: Towards a more efficient and expressive distributed computing framework, Proc. VLDB Endowm., vol. 9, no. 5, pp. 420-431, 2016.
[6]
P. Carbone, A. Katsifodimos, S. Ewen, V. Markl, S. Haridi, and K. Tzoumas, Apache flinkTM: Stream and batch processing in a single engine, Bull. IEEE Comput. Soc. Tech. Comm. Data Eng., vol. 36, no. 4, pp. 28-38, 2015.
[7]
M. Armbrust, R. S. Xin, C. Lian, Y. Huai, D. Liu, J. K. Bradley, X. R. Meng, T. Kaftan, M. J. Franklin, A. Ghodsi, et al., SparkSQL: Relational data processing in spark, in Proc. 2015 ACM SIGMOD Int. Conf. on Management of Data, Victoria, Australia, 2015, pp. 1383-1394.
[8]
M. Anderson, S. Smith, N. Sundaram, M. Capot? Z. G. Zhao, S. Dulloor, N. Satish, and T. L. Willke, Bridging the gap between HPC and big data frameworks, Proc. VLDB Endowme., vol. 10, no. 8, pp. 901-912, 2017.
[9]
G. M. Essertel, R. Y. Tahboub, J. M. Decker, K. J. Brown, K. Olukotun, and T. Rompf, Flare: Optimizing apache spark with native compilation for scale-up architectures and medium-size data, in Proc. of the 13th USENIX Conf. on Operating Systems Design and Implementation, Berkeley, CA, USA, 2018, pp. 799-815.
[10]
L. Lu, X. H. Shi, Y. L. Zhou, X. Zhang, H. Jin, C. Pei, L. G. He, and Y. Z. Geng, Lifetime-based memory management for distributed data processing systems, Proc. VLDB Endowm., vol. 9, no. 12, pp. 936-947, 2016.
[11]
C. Navasca, C. Cai, K. Nguyen, B. Demsky, S. Lu, M. Kim, and G. H. Xu, Gerenuk: Thin computation over big native data using speculative program transformation, in Proc. 27th ACM Symp. on Operating Systems Principles, Ontario, Canada, 2019, pp. 538-553.
[12]
J. Arnold, B. Glavic, and I. Raicu, A high-performance distributed relational database system for scalable OLAP processing, in 2019 IEEE Int. Parallel and Distributed Processing Symp. (IPDPS), Rio de Janeiro, Brazil, 2019, pp. 738-748.
[13]
T. Bingmann, M. Axtmann, E. Jöbstl, S. Lamm, H. C. Nguyen, A. Noe, S. Schlag, M. Stumpp, T. Sturm, and P. Sanders, Thrill: High-performance algorithmic distributed batch data processing with C++, in 2016 IEEE Int. Conf. on Big Data (Big Data), Washington, DC, USA, 2016, pp. 172-183.
[14]
E. Begoli, J. Camacho-Rodríguez, J. Hyde, M. J. Mior, and D. Lemire, Apache calcite: A foundational framework for optimized query processing over heterogeneous data sources, in Proc. 2018 Int. Conf. on Management of Data, Houston, TX, USA, 2018, pp. 221-230.
[15]
G. Graefe, and W. J. McKenna, The volcano optimizer generator: extensibility and efficient search, in Proc. IEEE 9th Int. Conf. on Data Engineering, 1993, Vienna, Austria, pp. 209-218.
[16]
G. Graefe, The cascades framework for query optimization, Data Eng. Bull., vol. 18, no. 3, pp. 19-29, 1995.
[17]
T. Neumann, Efficiently compiling efficient query plans for modern hardware, Proc. VLDB Endowm., vol. 4, no. 9, pp. 539-550, 2011.
[20]
S. Ghemawat, H. Gobioff, and S. T. Leung, The Google file system, in Proc. 19th ACM Symp. on Operating Systems Principles, Bolton Landing, NY, USA, 2003, pp. 29-43.
[21]
J. Dean and S. Ghemawat, MapReduce: Simplified data processing on large clusters, in 6th Symp. on Operating System Design and Implementation (OSDI 2004), San Francisco, CA, USA, 2004, pp. 137-150.
[22]
K. Shvachko, H. R. Kuang, S. Radia, and R. Chansler, The Hadoop distributed file system, in 2010 IEEE 26th Symp. on Mass Storage Systems and Technologies (Msst), Incline Village, NV, USA, 2010, pp. 1-10.
[23]
C. Swarna and Z. Ansari, Apache pig-A data flow framework based on Hadoop map reduce, IJETT J., vol. 50, no. 5, pp. 271-275, 2017.
[24]
A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, S. Antony, H. Liu, and R. Murthy, Hive-A petabyte scale data warehouse using Hadoop, in 2010 IEEE 26th Int. Conf. on Data Engineering (ICDE 2010), Long Beach, CA, USA, 2010, pp. 996-1005.
[25]
M. Kornacker, A. Behm, V. Bittorf, T. Bobrovytsky, C. Ching, A. Choi, J. Erickson, M. Grund, D. Hecht, M. Jacobs, et al., Impala: A modern, open-source SQL engine for Hadoop, presented at 7th Biennial Conf. on Innovative Data Systems Research (CIDR’15), Asilomar, CA, USA, 2015.
[26]
R. S. Xin, J. Rosen, M. Zaharia, M. J. Franklin, S. Shenker, and I. Stoica, Shark: SQL and rich analytics at scale, in Proc. 2013 ACM SIGMOD Int. Conf. on Management of Data, New York, NY, USA, 2013, pp. 13-24.
[27]
A. Behm, V. R. Borkar, M. J. Carey, R. Grover, C. Li, N. Onose, R. Vernica, A. Deutsch, Y. Papakonstantinou, and V. J. Tsotras, ASTERIX: Towards a scalable, semistructured data platform for evolving-world models, Distrib. Parallel Databases, vol. 29, no. 3, pp. 185-216, 2011.
[28]
A. Alexandrov, R. Bergmann, S. Ewen, J. C. Freytag, F. Hueske, A. Heise, O. Kao, M. Leich, U. Leser, V. Markl, et al., The stratosphere platform for big data analytics, VLDB J., vol. 23, no. 6, pp. 939-964, 2014.
[29]
A. Crotty, A. Galakatos, K. Dursun, T. Kraska, U. Cetintemel, and S. Zdonik, Tupleware: “Big” Data, Big Analytics, Small Clusters, presented at 7th Biennial Conf. on Innovative Data Systems Research (CIDR 2015), Asilomar, CA, USA, 2015.
[30]
R. Chaiken, B. Jenkins, P. Å. Larson, B. Ramsey, D. Shakib, S. Weaver, and J. R. Zhou, SCOPE: Easy and efficient parallel processing of massive data sets, Proc. VLDB Endowm., vol. 1, no. 2, pp. 1265-1276, 2008.
[31]
R. A. Lorie, XRM-An Extended (N-ary) Relational Memory. Yorktown Heights: IBM, 1974.
[32]
A. Kemper and T. Neumann, HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots, in 2011 IEEE 27th Int. Conf. on Data Engineering, Hannover, Germany, 2011, pp. 195-206.
[33]
F. McSherry, M. Isard, and D. G. Murray, Scalability! But at what COST? presented at 15th Workshop on Hot Topics in Operating Systems (HotOS XV), Kartause Ittingen, Switzerland, 2015.