Journal Home > Volume 5 , issue 1

With the recent advancements in computer technologies, the amount of data available is increasing day by day. However, excessive amounts of data create great challenges for users. Meanwhile, cloud computing services provide a powerful environment to store large volumes of data. They eliminate various requirements, such as dedicated space and maintenance of expensive computer hardware and software. Handling big data is a time-consuming task that requires large computational clusters to ensure successful data storage and processing. In this work, the definition, classification, and characteristics of big data are discussed, along with various cloud services, such as Microsoft Azure, Google Cloud, Amazon Web Services, International Business Machine cloud, Hortonworks, and MapR. A comparative analysis of various cloud-based big data frameworks is also performed. Various research challenges are defined in terms of distributed database storage, data security, heterogeneity, and data visualization.


menu
Abstract
Full text
Outline
About this article

Big Data with Cloud Computing: Discussions and Challenges

Show Author's information Amanpreet Kaur Sandhu( )
University Institute of Computing, Chandigarh University, Mohali 140413, India

Abstract

With the recent advancements in computer technologies, the amount of data available is increasing day by day. However, excessive amounts of data create great challenges for users. Meanwhile, cloud computing services provide a powerful environment to store large volumes of data. They eliminate various requirements, such as dedicated space and maintenance of expensive computer hardware and software. Handling big data is a time-consuming task that requires large computational clusters to ensure successful data storage and processing. In this work, the definition, classification, and characteristics of big data are discussed, along with various cloud services, such as Microsoft Azure, Google Cloud, Amazon Web Services, International Business Machine cloud, Hortonworks, and MapR. A comparative analysis of various cloud-based big data frameworks is also performed. Various research challenges are defined in terms of distributed database storage, data security, heterogeneity, and data visualization.

Keywords:

big data, data analysis, cloud computing, Hadoop
Received: 11 June 2021 Revised: 12 September 2021 Accepted: 13 September 2021 Published: 27 December 2021 Issue date: March 2022
References(50)
[1]
I. A. T. Hashem, I. Yaqoob, N. B. Anuar, S. Mokhtar, A. Gani, and S. U. Khan, The rise of ‘big data’ on cloud computing: Review and open research issues, Inform. Syst., vol. 47, pp. 98-115, 2015.
[2]
J. H. Yu and Z. M. Zhou, Components and development in big data system: A survey, J. Electr. Sci. Technol., vol. 17, no. 1, pp. 51-72, 2019.
[3]
S. Kumar and K. K. Mohbey, A review on big data based parallel and distributed approaches of pattern mining, J. King Saud Univ. - Comput. Inform. Sci., .
[4]
Y. N. Liu, N. Li, X. Zhu, and Y. Qi, How wide is the application of genetic big data in biomedicine, Biomed. Pharmacother., vol. 133, p. 111074, 2021.
[5]
V. Subramaniyaswamy, V. Vijayakumar, R. Logesh, and V. Indragandhi, Unstructured data analysis on big data using map reduce, Procedia Comput. Sci., vol. 50, pp. 456-465, 2015.
[6]
S. Maitrey and C. K. Jha, MapReduce: Simplified data analysis of big data, Procedia Comput. Sci., vol. 57, pp. 563-571, 2015.
[7]
A. Mohajer, M. Barari, and H. Zarrabi, Big data based self-optimization networking: A novel approach beyond cognition, Intell. Automat. Soft Comput., .
[8]
M. Batty, Big data, smart cities and city planning, Dialogues in Human Geography, vol. 3, no. 3, pp. 274-279, 2013.
[9]
T. C. Havens, J. C. Bezdek, C. Leckie, L. O. Hall, and M. Palaniswami, Fuzzy c-means algorithms for very large, IEEE Trans. Fuzzy Syst., vol. 20, no. 6, pp. 1130-1146, 2012.
[10]
D. Fisher, R. Deline, M. Czerwinski, and S. Drucker, Interactions with big data analytics, Interactions, vol. 19, no. 3, pp. 50-59, 2012.
[11]
The State Council of the People’s Republic of China, Action plan for promoting big data development, (in Chinese), , 2015.
[12]
M. A. Beyer and D. Laney, The importance of ‘big data’: A definition, Stamford, CT, USA: Gartner, G00235055, 2012.
[13]
L. Rabhi, N. Falih, A. Afraites, and B. Bouikhalene, Big data approach and its applications in various fields: Review, Procedia Comput. Sci., vol. 155, pp. 599-605, 2019.
[14]
F. Ridzuan and W. M. N. Wan Zainon, A review on data cleansing methods for big data, Procedia Comput. Sci., vol. 161, pp. 731-738, 2019.
[15]
D. A. Shafiq, N. Z. Jhanjhi, and A. Abdullah, Load balancing techniques in cloud computing environment: A review, J. King Saud Univ. - Comput. Inform. Sci., .
[16]
S. Amamou, Z. Trifa, and M. Khmakhem, Data protection in cloud computing: A survey of the state-of-art, Procedia Comput. Sci., vol. 159, pp. 155-161, 2019.
[17]
P. J. Sun, Security and privacy protection in cloud computing: Discussions and challenges, J. Netw. Comput. Appl., vol. 160, p. 102642, 2020.
[18]
R. Nachiappan, B. Javadi, R. N. Calheiros, and K. M. Matawie, Cloud storage reliability for Big Data applications: A state of the art survey, J. Netw. Comput. Appl., vol. 97, pp. 35-47, 2017.
[19]
A. O’Driscoll, J. Daugelaite, and R. D. Sleator, ‘Big data’, Hadoop and cloud computing in genomics, J. Biomed. Inform., vol. 46, no. 5, pp. 774-781, 2013.
[20]
S. Karimian-Aliabadi, D. Ardagna, R. Entezari-Maleki, E. Gianniti, and A. Movaghar, Analytical composite performance models for Big Data applications, J. Netw. Comput. Appl., vol. 142, pp. 63-75, 2019.
[21]
H. F. Yu, A priori algorithm optimization based on Spark platform under big data, Microprocess. Microsyst., vol. 80, p. 103528, 2021.
[22]
M. Muniswamaiah, T. Agerwala, and C. Tappert, Big data in cloud computing review and opportunities, Int. J. Comput. Sci. Inform. Technol., vol. 11, no. 4, pp. 43-57, 2019.
[23]
T. Cherian and H. Bhadkamkar, A study and survey of big data using data mining techniques, Int. J. Eng. Sci. Res. Technol., vol. 6, no. 10, pp. 169-174, 2017.
[24]
A. Mehmood, I. Natgunanathan, Y. Xiang, G. Hua, and S. Guo, Protection of big data privacy, IEEE Access, vol. 4, pp. 1821-1834, 2016.
[25]
S. Kumar and M. Singh, Big data analytics for healthcare industry: Impact, applications, and tools, Big Data Mining Analytics, vol. 2, no. 1, pp. 48-57, 2019.
[26]
S. Kumar and M. Singh, A novel clustering technique for efficient clustering of big data in Hadoop Ecosystem, Big Data Mining Analytics, vol. 2, no. 4, pp. 240-247, 2019.
[27]
C. K. Leung, Y. B. Chen, S. Y. Shang, and D. Y. Deng, Big data science on COVID-19 data, in Proc. of 2020 IEEE 14th Int. Conf. Big Data Science and Engineering, Guangzhou, China, 2020, pp. 14-21.
[28]
M. S. Mahmud, J. Z. Huang, S. Salloum, T. Z. Emara, and K. Sadatdiynov, A survey of data partitioning and sampling methods to support big data analysis, Big Data Mining Analytics, vol. 3, no. 2, pp. 85-101, 2020.
[29]
S. Aslam and M. A. Shah, Load balancing algorithms in cloud computing: A survey of modern techniques, in Proc. of 2020 IEEE National Software Engineering Conference, .
[30]
D. A. Shafiq, N. Z. Jhanjhi, and A. Abdullah, vLoad balancing techniques in cloud computing environment: A review, Journal of King Saud University-Computer and Information Sciences, .
[31]
A. Oussous, F. Z. Benjelloun, A. Ait Lahcen, and S. Belfkih, Big data technologies: A survey, J. King Saud Univ. - Comput. Inform. Sci., vol. 30, no. 4, pp. 431-448, 2018.
[32]
R. Misra, B. Panda, and M. Tiwary, Big data and ICT applications: A study, in Proc. 2nd International Conference on Information and Communication Technology for Competitive Strategies, , 2016.
[33]
B. Saraladevi, N. Pazhaniraja, P. V. Paul, M. S. S. Basha, and P. Dhavachelvan, Big data and Hadoop-A study in security perspective, Procedia Computer Science, vol. 50, pp. 596-“601, 2015.
[34]
A. Fahad, N. Alshatri, Z. Tari, A. Alamri, I. Khalil, A. Y. Zomaya, S. Foufou, and A. Bouras, A survey of clustering algorithms for Big Data: Taxonomy and empirical analysis, IEEE Trans. Emerg. Top. Comput., vol. 2, no. 3, pp. 267-279, 2014.
[35]
A. Katal, M. Wazid, and R. H. Goudar, Big data: Issues, challenges, tools and good practices, in Proc. 6th Int. Conf. Contemporary Computing, Noida, India, 2013, pp. 404-409.
[36]
C. W. Tsai, C. F. Lai, H. C. Chao, and A. V. Vasilakos, Big data analytics: A survey, J. Big Data, vol. 2, no. 1, pp. 1-32, 2015.
[37]
Z. Lv, H. Song, P. Basanta-Val, A. Steed, and M. Jo, Next-generation big data analytics: State of the art, challenges, and future research topics, IEEE Trans. Ind. Informatics, vol. 13, no. 4, pp. 1891-1899, 2017.
[38]
K. S. Jadon, R. S. Bhadoria, and G. S. Tomar, A review on costing issues in big data analytics, in Proc. 2015 International Conference on Computational Intelligence and Communication Networks (CICN), Jabalpur, India, 2016, pp. 727-730.
[39]
O. Y. Al-Jarrah, P. D. Yoo, S. Muhaidat, G. K. Karagiannidis, and K. Taha, Efficient machine learning for big data: A review, Big Data Res., vol. 2, no. 3, pp. 87-93, 2015.
[40]
G. S. Bhathal and A. Singh, Big Data: Hadoop framework vulnerabilities, security issues and attacks, Array, vols. 1&2, p. 100002, 2019.
[41]
J. Hurwitz, A. Nugent, F. Halper, and M. Kaufman, Big Data for Dummies. Hoboken, NJ, USA: John Wiley & Sons, Inc., 2013.
[42]
V. P. Lalitha, M. Y. Sagar, S. Sharanappa, S. Hanji, and R. Swarup, Data security in cloud, in Proc. of 2017 Int. Conf. Energy, Communication, Data Analytics and Soft Computing, Chennai, India, pp. 3604-3608, 2017.
[43]
C. L. Philip Chen and C. Y. Zhang, Data-intensive applications, challenges, techniques and technologies: A survey on Big Data, Inform. Sci., vol. 275, pp. 314-347, 2014.
[44]
S. Salloum, J. Z. Huang, and Y. He, Random sample partition: A distributed data model for big data analysis, IEEE Trans. Ind. Inform., vol. 15, no. 11, pp. 5846-5854, 2019.
[45]
L. Q. Kong, Z. F. Liu, and J. G. Wu, A systematic review of big data-based urban sustainability research: state-of-the-science and future directions, J. Clean. Prod., vol. 273, p. 123142, 2020.
[46]
P. Pääkkönen and D. Pakkala, Reference architecture and classification of technologies, products and services for big data systems, Big Data Res., vol. 2, no. 4, pp. 166-186, 2015.
[47]
M. Wook, N. A. Hasbullah, N. M. Zainudin, Z. Z. A. Jabar, S. Ramli, N. A. M. Razali, and N. M. M. Yusop, Exploring big data traits and data quality dimensions for big data analytics application using partial least squares structural equation modelling, J. Big Data, vol. 8, no. 1, pp. 1-15, 2021.
[48]
S. Saif and S. Wazir, Performance analysis of big data and cloud computing techniques: A survey, Procedia Comput. Sci., vol. 132, pp. 118-127, 2018.
[49]
S. M. Shamsuddin and S. Hasan, Data science vs. big data @ UTM big data centre, in Proc. of 2015 IEEE Int. Conf. Science in Information Technology, Yogyakarta, Indonesia, 2015, pp. 1-4.
[50]
T. Y. Yang and Y. Zhao, Application of cloud computing in biomedicine big data analysis cloud computing in big data, in Proc. of the 2017 Int. Conf. Algorithms, Methodology, Models and Applications in Emerging Technologies (ICAMMAET), Chennai, India, 2017, pp. 1-3.
Publication history
Copyright
Rights and permissions

Publication history

Received: 11 June 2021
Revised: 12 September 2021
Accepted: 13 September 2021
Published: 27 December 2021
Issue date: March 2022

Copyright

© The author(s) 2022

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Reprints and Permission requests may be sought directly from editorial office.

Return