High-Performance Flow Classification of Big Data Using Hybrid CPU-GPU Clusters of Cloud Environments

Azam Fazel-Najafabadi; Mahdi Abbasi; Hani H. Attar; Ayman Amer; Amir Taherkordi; Azad Shokrollahi; Mohammad R. Khosravi; Ahmed A. Solyman

doi:10.26599/TST.2023.9010088

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Journals A - Z

About Us

Publish with Us

Support

PDF (3.4 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Open Access

High-Performance Flow Classification of Big Data Using Hybrid CPU-GPU Clusters of Cloud Environments

Azam Fazel-Najafabadi^¹, Mahdi Abbasi^¹(

), Hani H. Attar^², Ayman Amer^², Amir Taherkordi^³, Azad Shokrollahi^⁴, Mohammad R. Khosravi^⁵, Ahmed A. Solyman^⁶

1Department of Computer Engineering, Faculty of Engineering, Bu-Ali Sina University, Hamedan 6516738695, Iran

2Department of Energy Engineering, Zarqa University, Zarqa 13132, Jordan

3Department of Informatics, University of Oslo, Oslo 0316, Norway

4Department of Computer Science, Malmö University, Malmö 20506, Sweden

5Shandong Provincial University Laboratory for Protected Horticulture, Weifang University of Science and Technology, Weifang 261100, China

6Department of Electrical and Electronics Engineering, Nişantaşı University, Istanbul 34481742, Türkiye

Show Author Information

Abstract

The network switches in the data plane of Software Defined Networking (SDN) are empowered by an elementary process, in which enormous number of packets which resemble big volumes of data are classified into specific flows by matching them against a set of dynamic rules. This basic process accelerates the processing of data, so that instead of processing singular packets repeatedly, corresponding actions are performed on corresponding flows of packets. In this paper, first, we address limitations on a typical packet classification algorithm like Tuple Space Search (TSS). Then, we present a set of different scenarios to parallelize it on different parallel processing platforms, including Graphics Processing Units (GPUs), clusters of Central Processing Units (CPUs), and hybrid clusters. Experimental results show that the hybrid cluster provides the best platform for parallelizing packet classification algorithms, which promises the average throughput rate of 4.2 Million packets per second (Mpps). That is, the hybrid cluster produced by the integration of Compute Unified Device Architecture (CUDA), Message Passing Interface (MPI), and OpenMP programming model could classify 0.24 million packets per second more than the GPU cluster scheme. Such a packet classifier satisfies the required processing speed in the programmable network systems that would be used to communicate big medical data.

Keywords

OpenMP Compute Unified Device Architecture (CUDA)Message Passing Interface (MPI)packet classification medical data tuple space algorithm Graphics Processing Unit (GPU) cluster

References

[1]

H. Attar, H. Issa, J. Ababneh, M. Abbasi, A. A. A. Solyman, M. Khosravi, and R. S. Agieb, 5G system overview for ongoing smart applications: Structure, requirements, and specifications, Comput. Intell. Neurosci., vol. 2022, p. 2476841, 2022.

Crossref

[2]

M. Abbasi, S. V. Fazel, and M. Rafiee, MBitCuts: Optimal bit-level cutting in geometric space packet classification, J. Supercomput., vol. 76, no. 4, pp. 3105–3128, 2020.

Crossref Google Scholar

[3]

M. Abbasi, S. Maleki, G. Jeon, M. R. Khosravi, and H. Abdoli, An intelligent method for reducing the overhead of analysing big data flows in Openflow switch, IET Commun., vol. 16, no. 5, pp. 548–559, 2022.

Crossref Google Scholar

[4]

S. Wu, S. Shen, X. Xu, Y. Chen, X. Zhou, D. Liu, X. Xue, and L. Qi, Popularity-aware and diverse web APIs recommendation based on correlation graph, IEEE Trans. Comput. Soc. Syst., vol. 10, no. 2, pp. 771–782, 2023.

Crossref Google Scholar

[5]

X. Zhou, Y. Hu, J. Wu, W. Liang, J. Ma, and Q. Jin, Distribution bias aware collaborative generative adversarial network for imbalanced deep learning in industrial IoT, IEEE Trans. Industr. Inform., vol. 19, no. 1, pp. 570–580, 2023.

Crossref Google Scholar

[6]

X. Li, F. Ren, and B. Yang, Modeling and analyzing the performance of high-speed packet I/O, Tsinghua Science and Technology, vol. 26, no. 4, pp. 426–439, 2021.

Crossref Google Scholar

[7]

X. Zhou, X. Xu, W. Liang, Z. Zeng, and Z. Yan, Deep-learning-enhanced multitarget detection for end-edge-cloud surveillance in smart IoT, IEEE Internet Things J., vol. 8, no. 16, pp. 12588–12596, 2021.

Crossref Google Scholar

[8]

L. Qi, W. Lin, X. Zhang, W. Dou, X. Xu, and J. Chen, A correlation graph based approach for personalized and compatible web APIs recommendation in mobile APP development, IEEE Trans. Knowl. Data Eng., vol. 35, no. 6, pp. 5444–5457, 2023.

Google Scholar

[9]

X. Xu, J. Gu, H. Yan, W. Liu, L. Qi, and X. Zhou, Reputation-aware supplier assessment for blockchain-enabled supply chain in industry 4.0, IEEE Trans. Industr. Inform., vol. 19, no. 4, pp. 5485–5494, 2023.

Crossref Google Scholar

[10]

J. Ren, J. Li, H. Liu, and T. Qin, Task offloading strategy with emergency handling and blockchain security in SDN-empowered and fog-assisted healthcare IoT, Tsinghua Science and Technology, vol. 27, no. 4, pp. 760–776, 2022.

Crossref Google Scholar

[11]

M. Abbasi and A. Shokrollahi, Enhancing the performance of decision tree-based packet classification algorithms using CPU cluster, Cluster Comput., vol. 23, no. 4, pp. 3203–3219, 2020.

Crossref Google Scholar

[12]

Y. Tai, W. Hu, L. Zhang, D. Mu, and R. Kastner, A multi-flow information flow tracking approach for proving quantitative hardware security properties, Tsinghua Science and Technology, vol. 26, no. 1, pp. 62–71, 2021.

Crossref Google Scholar

[13]

Y. Cheng, W. Wang, J. Wang, and H. Wang, FPC: A new approach to firewall policies compression, Tsinghua Science and Technology, vol. 24, no. 1, pp. 65–76, 2019.

Crossref

[14]

D. E. Taylor, Survey and taxonomy of packet classification techniques, ACM Comput. Surv., vol. 37, no. 3, pp. 238–275, 2005.

Crossref Google Scholar

[15]

V. Srinivasan, S. Suri, and G. Varghese, Packet classification using tuple space search, in Proc. Conf. Applications, Technologies, Architectures, and Protocols for Computer Communication, Cambridge, MA, USA, 1999, pp. 135–146.

Crossref

[16]

R. E. Grant and S. L. Olivier, Chapter 6—Networks and MPI for cluster computing, in Topics in Parallel and Distributed Computing, S. K. Prasad, A. Gupta, A. L. Rosenberg, A. Sussman, and C. C. Weems, eds. Boston, MA, USA Morgan Kaufmann, 2015, pp. 117–153.

Crossref

[17]

X. Zhou, W. Liang, K. Yan, W. Li, K. I. K. Wang, J. Ma, and Q. Jin, Edge-enabled two-stage scheduling based on deep reinforcement learning for internet of everything, IEEE Internet Things J., vol. 10, no. 4, pp. 3295–3304, 2023.

Crossref Google Scholar

[18]

Y. Jia, B. Liu, W. Dou, X. Xu, X. Zhou, L. Qi, and Z. Yan, CroApp: A CNN-based resource optimization approach in edge computing environment, IEEE Trans. Industr. Inform., vol. 18, no. 9, pp. 6300–6307, 2022.

Crossref Google Scholar

[19]

H. Xiao, Y. Lu, J. Huang, and W. Xue, An MPI+OpenACC-based PRM scalar advection scheme in the GRAPES model over a cluster with multiple CPUs and GPUs, Tsinghua Science and Technology, vol. 27, no. 1, pp. 164–173, 2022.

Crossref Google Scholar

[20]

L. Yang, X. Chen, Y. Luo, X. Lan, and W. Wang, IDEA: A utility-enhanced approach to incomplete data stream anonymization, Tsinghua Science and Technology, vol. 27, no. 1, pp. 127–140, 2022.

Crossref

[21]

M. Daydé, O. Marques, and K. Nakajima, High Performance Computing for Computational Science–VECPAR 2012. Berlin, Germany: Springer, 2013.

Crossref

[22]

X. Wu and W. Li, Performance models for scalable cluster computing, J. Syst. Architect., vol. 44, nos. 4&5, pp. 189–205, 1998.

Crossref Google Scholar

[23]

G. Jost, H. Jin, D. Anmey, and F. F. Hatay, Comparing the OpenMP, MPI, and hybrid programming paradigm on an SMP cluster, in Proc. European Workshop on OpenMP and Applications 2003, https://xueshu.baidu.com/usercenter/paper/show?paperid=ea1913c04a47b23fe27b35b0dad3856a&site=xueshu_se, 2023.

[24]

R. Rabenseifner, G. Hager, and G. Jost, Hybrid MPI/OpenMP parallel programming on clusters of multi-core SMP nodes, in Proc. 17th Euromicro Int. Conf. Parallel, Distributed and Network-Based Processing, Weimar, Germany, 2009, pp. 427–436.

Crossref

[25]

E. Schikuta, MPI: A Message-passing interface standard, Knoxville, TN, USA: University of Tennessee, 1994.

[26]

J. Adam, M. Kermarquer, J. B. Besnard, L. Bautista-Gomez, M. Pérache, P. Carribault, J. Jaeger, A. D. Malony, and S. Shende, Checkpoint/restart approaches for a thread-based MPI runtime, Parallel Computing, vol. 85, pp. 204–219, 2019.

Crossref Google Scholar

[27]

J. López-Gómez, J. F. Muñoz, D. Del Rio Astorga, M. F. Dolz, and J. D. Garcia, Exploring stream parallel patterns in distributed MPI environments, Parallel Computing, vol. 84, pp. 24–36, 2019.

Crossref Google Scholar

[28]

NVIDIA CUDA, Compute Unified Device Architecture Programming Guide 3. 2, 2010.

[29]

J. Cheng, M. Grossman, and T. McKercher, Professional CUDA C Programming. New York, NY, USA: John Wiley & Sons, 2014.

[30]

C. T. Yang, C. L. Huang, and C. F. Lin, Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters, Comput. Phys. Commun., vol. 182, no. 1, pp. 266–269, 2011.

Crossref Google Scholar

[31]

S. Zhou, Y. R. Qu, and V. K. Prasanna, Multi-core implementation of decomposition-based packet classification algorithms, J. Supercomput., vol. 69, no. 1, pp. 34–42, 2014.

Crossref Google Scholar

[32]

Y. R. Qu, S. Zhou, and V. K. Prasanna, A decomposition-based approach for scalable many-field packet classification on multi-core processors, Int. J. Parallel Program., vol. 43, no. 6, pp. 965–987, 2015.

Crossref Google Scholar

[33]

F. Pong and N. F. Tzeng, HaRP: Rapid packet classification via hashing round-down prefixes, IEEE Trans. Parallel Distrib. Syst., vol. 22, no. 7, pp. 1105–1119, 2011.

Crossref Google Scholar

[34]

A. Nottingham and B. Irwin, GPU packet classification using OpenCL: A consideration of viable classification methods, in Proc. 2009 Ann. Research Conf. of the South African Institute of Computer Scientists and Information Technologists, Vanderbijlpark Emfuleni, South Africa, 2009, pp. 160–169.

Crossref

[35]

C. L. Hung, H. H. Wang, S. W. Guo, Y. L. Lin, and K. C. Li, Efficient GPGPU-based parallel packet classification, in Proc. IEEE 10^th Int. Conf. Trust, Security and Privacy in Computing and Communications, Changsha, China, 2011, pp. 1367–1374.

Crossref

[36]

Y. Deng, X. Jiao, S. Mu, K. Kang, and Y. Zhu, NPGPU: Network processing on graphics processing units, in Proc. Second Int. Conf. Theoretical and Mathematical Foundations of Computer Science, Singapore, 2011, pp. 313–321.

Crossref

[37]

K. Kang and Y. S. Deng, Scalable packet classification via GPU metaprogramming, in Proc. Design, Automation & Test in Europe, Grenoble, France, 2011, pp. 1–4.

Crossref

[38]

S. Zhou, S. G. Singapura, and V. K. Prasanna, High-performance packet classification on GPU, in Proc. IEEE High Performance Extreme Computing Conf. (HPEC), Waltham, MA, USA, 2014, pp. 1–6.

Crossref

[39]

M. Varvello, R. Laufer, F. Zhang, and T. V. Lakshman, Multilayer packet classification with graphics processing units, IEEE/ACM Trans. Netw., vol. 24, no. 5, pp. 2728–2741, 2016.

Crossref Google Scholar

[40]

J. Zheng, D. Zhang, Y. Li, and G. Li, Accelerate packet classification using GPU: A case study on HiCuts, in Computer Science and its Applications, J. Park, I. Stojmenovic, H. Jeong, and G. Yi, eds. Berlin, Germany: Springer, 2015, pp. 231–238.

Crossref

[41]

Y. R. Qu, H. H. Zhang, S. Zhou, and V. K. Prasanna, Optimizing many-field packet classification on FPGA, multi-core general purpose processor, and GPU, in Proc. Eleventh ACM/IEEE Symp. Architectures for Networking and Communications Systems, Oakland, CA, USA, 2015, pp. 87–98.

[42]

M. Abbasi and M. Rafiee, A calibrated asymptotic framework for analyzing packet classification algorithms on GPUs, J. Supercomput., vol. 75, pp. 6574−6611, 2019.

Crossref

[43]

D. S. Henty, Performance of hybrid message-passing and shared-memory parallelism for discrete element modeling, in Proc. ACM/IEEE Conf. Supercomputing, Dallas, TX, USA, 2000, p. 10.

Crossref

[44]

J. Hutter and A. Curioni, Dual-level parallelism for ab initio molecular dynamics: Reaching teraflop performance with the CPMD code, Parallel Comput., vol. 31, no. 1, pp. 1–17, 2005.

Crossref Google Scholar

[45]

F. Cappello and D. Etiemble, MPI versus MPI+OpenMP on the IBM SP for the NAS benchmarks, in Proc. ACM/IEEE Conf. Supercomputing, Dallas, TX, USA, 2000, p. 12.

Crossref

[46]

M. Ferretti and L. Santangelo, Hybrid OpenMP-MPI parallelism: Porting experiments from small to large clusters, in Proc. 26th Euromicro Int. Conf. Parallel, Distributed and Network-Based Processing (PDP), Cambridge, UK, 2018, pp. 297–301.

Crossref

[47]

Y. Y. Jiao, Q. Zhao, L. Wang, G. H. Huang, and F. Tan, A hybrid MPI/OpenMP parallel computing model for spherical discontinuous deformation analysis, Comput. Geotech., vol. 106, pp. 217–227, 2019.

Crossref Google Scholar

[48]

M. Abbasi, A. Shokrollahi, M. R. Khosravi, and V. G. Menon, High-performance flow classification using hybrid clusters in software defined mobile edge computing, Comput. Commun., vol. 160, pp. 643–660, 2020.

Crossref Google Scholar

[49]

D. Jacobsen, J. Thibault, and I. Senocak, An MPI-CUDA implementation for massively parallel incompressible flow computations on multi-GPU clusters, in Proc. 48th AIAA Aerospace Sciences Meeting Including the New Horizons Forum and Aerospace Exposition, Orlando, FL, USA, 2010, p. 522.

Crossref

[50]

V. V. Kindratenko, J. J. Enos, G. Shi, M. T. Showerman, G. W. Arnold, J. E. Stone, J. C. Phillips, and W. M. Hwu, GPU clusters for high-performance computing, in Proc. IEEE Int. Conf. Cluster Computing and Workshops, New Orleans, LA, USA, 2009, pp. 1–8.

Crossref

[51]

F. Lu, J. Song, F. Yin, and X. Zhu, Performance evaluation of hybrid programming patterns for large CPU/GPU heterogeneous clusters, Comput. Phys. Commun., vol. 183, no. 6, pp. 1172–1181, 2012.

Crossref Google Scholar

[52]

F. D. Sacerdoti, S. Chandra, and K. Bhatia, Grid systems deployment & management using rocks, in Proc. IEEE Int. Conf. Cluster Computing, San Diego, CA, USA, 2004, pp. 337–345.

[53]

D. E. Taylor and J. S. Turner, Classbench: A packet classification benchmark, IEEE/ACM Trans. Netw., vol. 15, no. 3, pp. 499–511, 2007.

Crossref Google Scholar

[54]

L. Qi, Y. Yang, X. Zhou, W. Rafique, and J. Ma, Fast anomaly identification based on multiaspect data streams for intelligent intrusion detection toward secure industry 4.0, IEEE Trans. Industr. Inform., vol. 18, no. 9, pp. 6503–6511, 2022.

Crossref Google Scholar

[55]

Z. Li, X. Xu, T. Hang, H. Xiang, Y. Cui, L. Qi, and X. Zhou, A knowledge-driven anomaly detection framework for social production system, IEEE Trans. Comput. Soc. Syst..

[56]

X. Zhou, X. Yang, J. Ma, and K. I. K. Wang, Energy-efficient smart routing based on link correlation mining for wireless edge computing in IoT, Internet Intern. Things J., vol. 9, no. 16, pp. 14988–14997, 2022.

Crossref Google Scholar

[57]

M. Abbasi, A. Najafi, M. Rafiee, M. R. Khosravi, V. G. Menon, and G. Muhammad, Efficient flow processing in 5G-envisioned SDN-based internet of vehicles using GPUs, IEEE Trans. Intell. Transp. Syst., vol. 22, no. 8, pp. 5283–5292, 2021.

Crossref Google Scholar

Tsinghua Science and Technology

Volume 29 Issue 4,
August 2024

Pages 1118-1137

DOI: 10.26599/TST.2023.9010088

Cite this article:

Fazel-Najafabadi A, Abbasi M, Attar HH, et al. High-Performance Flow Classification of Big Data Using Hybrid CPU-GPU Clusters of Cloud Environments. Tsinghua Science and Technology, 2024, 29(4): 1118-1137. https://doi.org/10.26599/TST.2023.9010088

654

Views

166

Downloads

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Altmetrics

Received: 29 May 2023

Revised: 18 July 2023

Accepted: 06 August 2023

Published: 09 February 2024

The articles published in this open access journal are distributed under the terms of theCreative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).