Journal Home > Volume 29 , Issue 4

The network switches in the data plane of Software Defined Networking (SDN) are empowered by an elementary process, in which enormous number of packets which resemble big volumes of data are classified into specific flows by matching them against a set of dynamic rules. This basic process accelerates the processing of data, so that instead of processing singular packets repeatedly, corresponding actions are performed on corresponding flows of packets. In this paper, first, we address limitations on a typical packet classification algorithm like Tuple Space Search (TSS). Then, we present a set of different scenarios to parallelize it on different parallel processing platforms, including Graphics Processing Units (GPUs), clusters of Central Processing Units (CPUs), and hybrid clusters. Experimental results show that the hybrid cluster provides the best platform for parallelizing packet classification algorithms, which promises the average throughput rate of 4.2 Million packets per second (Mpps). That is, the hybrid cluster produced by the integration of Compute Unified Device Architecture (CUDA), Message Passing Interface (MPI), and OpenMP programming model could classify 0.24 million packets per second more than the GPU cluster scheme. Such a packet classifier satisfies the required processing speed in the programmable network systems that would be used to communicate big medical data.


menu
Abstract
Full text
Outline
About this article

High-Performance Flow Classification of Big Data Using Hybrid CPU-GPU Clusters of Cloud Environments

Show Author's information Azam Fazel-Najafabadi1Mahdi Abbasi1( )Hani H. Attar2Ayman Amer2Amir Taherkordi3Azad Shokrollahi4Mohammad R. Khosravi5Ahmed A. Solyman6
Department of Computer Engineering, Faculty of Engineering, Bu-Ali Sina University, Hamedan 6516738695, Iran
Department of Energy Engineering, Zarqa University, Zarqa 13132, Jordan
Department of Informatics, University of Oslo, Oslo 0316, Norway
Department of Computer Science, Malmö University, Malmö 20506, Sweden
Shandong Provincial University Laboratory for Protected Horticulture, Weifang University of Science and Technology, Weifang 261100, China
Department of Electrical and Electronics Engineering, Nişantaşı University, Istanbul 34481742, Türkiye

Abstract

The network switches in the data plane of Software Defined Networking (SDN) are empowered by an elementary process, in which enormous number of packets which resemble big volumes of data are classified into specific flows by matching them against a set of dynamic rules. This basic process accelerates the processing of data, so that instead of processing singular packets repeatedly, corresponding actions are performed on corresponding flows of packets. In this paper, first, we address limitations on a typical packet classification algorithm like Tuple Space Search (TSS). Then, we present a set of different scenarios to parallelize it on different parallel processing platforms, including Graphics Processing Units (GPUs), clusters of Central Processing Units (CPUs), and hybrid clusters. Experimental results show that the hybrid cluster provides the best platform for parallelizing packet classification algorithms, which promises the average throughput rate of 4.2 Million packets per second (Mpps). That is, the hybrid cluster produced by the integration of Compute Unified Device Architecture (CUDA), Message Passing Interface (MPI), and OpenMP programming model could classify 0.24 million packets per second more than the GPU cluster scheme. Such a packet classifier satisfies the required processing speed in the programmable network systems that would be used to communicate big medical data.

Keywords: OpenMP, Compute Unified Device Architecture (CUDA), Message Passing Interface (MPI), packet classification, medical data, tuple space algorithm, Graphics Processing Unit (GPU) cluster

References(57)

[1]
H. Attar, H. Issa, J. Ababneh, M. Abbasi, A. A. A. Solyman, M. Khosravi, and R. S. Agieb, 5G system overview for ongoing smart applications: Structure, requirements, and specifications, Comput. Intell. Neurosci., vol. 2022, p. 2476841, 2022.
DOI
[2]

M. Abbasi, S. V. Fazel, and M. Rafiee, MBitCuts: Optimal bit-level cutting in geometric space packet classification, J. Supercomput., vol. 76, no. 4, pp. 3105–3128, 2020.

[3]

M. Abbasi, S. Maleki, G. Jeon, M. R. Khosravi, and H. Abdoli, An intelligent method for reducing the overhead of analysing big data flows in Openflow switch, IET Commun., vol. 16, no. 5, pp. 548–559, 2022.

[4]

S. Wu, S. Shen, X. Xu, Y. Chen, X. Zhou, D. Liu, X. Xue, and L. Qi, Popularity-aware and diverse web APIs recommendation based on correlation graph, IEEE Trans. Comput. Soc. Syst., vol. 10, no. 2, pp. 771–782, 2023.

[5]

X. Zhou, Y. Hu, J. Wu, W. Liang, J. Ma, and Q. Jin, Distribution bias aware collaborative generative adversarial network for imbalanced deep learning in industrial IoT, IEEE Trans. Industr. Inform., vol. 19, no. 1, pp. 570–580, 2023.

[6]

X. Li, F. Ren, and B. Yang, Modeling and analyzing the performance of high-speed packet I/O, Tsinghua Science and Technology, vol. 26, no. 4, pp. 426–439, 2021.

[7]

X. Zhou, X. Xu, W. Liang, Z. Zeng, and Z. Yan, Deep-learning-enhanced multitarget detection for end-edge-cloud surveillance in smart IoT, IEEE Internet Things J., vol. 8, no. 16, pp. 12588–12596, 2021.

[8]

L. Qi, W. Lin, X. Zhang, W. Dou, X. Xu, and J. Chen, A correlation graph based approach for personalized and compatible web APIs recommendation in mobile APP development, IEEE Trans. Knowl. Data Eng., vol. 35, no. 6, pp. 5444–5457, 2023.

[9]

X. Xu, J. Gu, H. Yan, W. Liu, L. Qi, and X. Zhou, Reputation-aware supplier assessment for blockchain-enabled supply chain in industry 4.0, IEEE Trans. Industr. Inform., vol. 19, no. 4, pp. 5485–5494, 2023.

[10]

J. Ren, J. Li, H. Liu, and T. Qin, Task offloading strategy with emergency handling and blockchain security in SDN-empowered and fog-assisted healthcare IoT, Tsinghua Science and Technology, vol. 27, no. 4, pp. 760–776, 2022.

[11]

M. Abbasi and A. Shokrollahi, Enhancing the performance of decision tree-based packet classification algorithms using CPU cluster, Cluster Comput., vol. 23, no. 4, pp. 3203–3219, 2020.

[12]

Y. Tai, W. Hu, L. Zhang, D. Mu, and R. Kastner, A multi-flow information flow tracking approach for proving quantitative hardware security properties, Tsinghua Science and Technology, vol. 26, no. 1, pp. 62–71, 2021.

[13]
Y. Cheng, W. Wang, J. Wang, and H. Wang, FPC: A new approach to firewall policies compression, Tsinghua Science and Technology, vol. 24, no. 1, pp. 65–76, 2019.
DOI
[14]

D. E. Taylor, Survey and taxonomy of packet classification techniques, ACM Comput. Surv., vol. 37, no. 3, pp. 238–275, 2005.

[15]
V. Srinivasan, S. Suri, and G. Varghese, Packet classification using tuple space search, in Proc. Conf. Applications, Technologies, Architectures, and Protocols for Computer Communication, Cambridge, MA, USA, 1999, pp. 135–146.
DOI
[16]
R. E. Grant and S. L. Olivier, Chapter 6—Networks and MPI for cluster computing, in Topics in Parallel and Distributed Computing, S. K. Prasad, A. Gupta, A. L. Rosenberg, A. Sussman, and C. C. Weems, eds. Boston, MA, USA Morgan Kaufmann, 2015, pp. 117–153.
DOI
[17]

X. Zhou, W. Liang, K. Yan, W. Li, K. I. K. Wang, J. Ma, and Q. Jin, Edge-enabled two-stage scheduling based on deep reinforcement learning for internet of everything, IEEE Internet Things J., vol. 10, no. 4, pp. 3295–3304, 2023.

[18]

Y. Jia, B. Liu, W. Dou, X. Xu, X. Zhou, L. Qi, and Z. Yan, CroApp: A CNN-based resource optimization approach in edge computing environment, IEEE Trans. Industr. Inform., vol. 18, no. 9, pp. 6300–6307, 2022.

[19]

H. Xiao, Y. Lu, J. Huang, and W. Xue, An MPI+OpenACC-based PRM scalar advection scheme in the GRAPES model over a cluster with multiple CPUs and GPUs, Tsinghua Science and Technology, vol. 27, no. 1, pp. 164–173, 2022.

[20]
L. Yang, X. Chen, Y. Luo, X. Lan, and W. Wang, IDEA: A utility-enhanced approach to incomplete data stream anonymization, Tsinghua Science and Technology, vol. 27, no. 1, pp. 127–140, 2022.
DOI
[21]
M. Daydé, O. Marques, and K. Nakajima, High Performance Computing for Computational Science–VECPAR 2012. Berlin, Germany: Springer, 2013.
DOI
[22]

X. Wu and W. Li, Performance models for scalable cluster computing, J. Syst. Architect., vol. 44, nos. 4&5, pp. 189–205, 1998.

[23]
G. Jost, H. Jin, D. Anmey, and F. F. Hatay, Comparing the OpenMP, MPI, and hybrid programming paradigm on an SMP cluster, in Proc. European Workshop on OpenMP and Applications 2003, https://xueshu.baidu.com/usercenter/paper/show?paperid=ea1913c04a47b23fe27b35b0dad3856a&site=xueshu_se, 2023.
[24]
R. Rabenseifner, G. Hager, and G. Jost, Hybrid MPI/OpenMP parallel programming on clusters of multi-core SMP nodes, in Proc. 17th Euromicro Int. Conf. Parallel, Distributed and Network-Based Processing, Weimar, Germany, 2009, pp. 427–436.
DOI
[25]
E. Schikuta, MPI: A Message-passing interface standard, Knoxville, TN, USA: University of Tennessee, 1994.
[26]

J. Adam, M. Kermarquer, J. B. Besnard, L. Bautista-Gomez, M. Pérache, P. Carribault, J. Jaeger, A. D. Malony, and S. Shende, Checkpoint/restart approaches for a thread-based MPI runtime, Parallel Computing, vol. 85, pp. 204–219, 2019.

[27]

J. López-Gómez, J. F. Muñoz, D. Del Rio Astorga, M. F. Dolz, and J. D. Garcia, Exploring stream parallel patterns in distributed MPI environments, Parallel Computing, vol. 84, pp. 24–36, 2019.

[28]
NVIDIA CUDA, Compute Unified Device Architecture Programming Guide 3. 2, 2010.
[29]
J. Cheng, M. Grossman, and T. McKercher, Professional CUDA C Programming. New York, NY, USA: John Wiley & Sons, 2014.
[30]

C. T. Yang, C. L. Huang, and C. F. Lin, Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters, Comput. Phys. Commun., vol. 182, no. 1, pp. 266–269, 2011.

[31]

S. Zhou, Y. R. Qu, and V. K. Prasanna, Multi-core implementation of decomposition-based packet classification algorithms, J. Supercomput., vol. 69, no. 1, pp. 34–42, 2014.

[32]

Y. R. Qu, S. Zhou, and V. K. Prasanna, A decomposition-based approach for scalable many-field packet classification on multi-core processors, Int. J. Parallel Program., vol. 43, no. 6, pp. 965–987, 2015.

[33]

F. Pong and N. F. Tzeng, HaRP: Rapid packet classification via hashing round-down prefixes, IEEE Trans. Parallel Distrib. Syst., vol. 22, no. 7, pp. 1105–1119, 2011.

[34]
A. Nottingham and B. Irwin, GPU packet classification using OpenCL: A consideration of viable classification methods, in Proc. 2009 Ann. Research Conf. of the South African Institute of Computer Scientists and Information Technologists, Vanderbijlpark Emfuleni, South Africa, 2009, pp. 160–169.
DOI
[35]
C. L. Hung, H. H. Wang, S. W. Guo, Y. L. Lin, and K. C. Li, Efficient GPGPU-based parallel packet classification, in Proc. IEEE 10 th Int. Conf. Trust, Security and Privacy in Computing and Communications, Changsha, China, 2011, pp. 1367–1374.
DOI
[36]
Y. Deng, X. Jiao, S. Mu, K. Kang, and Y. Zhu, NPGPU: Network processing on graphics processing units, in Proc. Second Int. Conf. Theoretical and Mathematical Foundations of Computer Science, Singapore, 2011, pp. 313–321.
DOI
[37]
K. Kang and Y. S. Deng, Scalable packet classification via GPU metaprogramming, in Proc. Design, Automation & Test in Europe, Grenoble, France, 2011, pp. 1–4.
DOI
[38]
S. Zhou, S. G. Singapura, and V. K. Prasanna, High-performance packet classification on GPU, in Proc. IEEE High Performance Extreme Computing Conf. (HPEC), Waltham, MA, USA, 2014, pp. 1–6.
DOI
[39]

M. Varvello, R. Laufer, F. Zhang, and T. V. Lakshman, Multilayer packet classification with graphics processing units, IEEE/ACM Trans. Netw., vol. 24, no. 5, pp. 2728–2741, 2016.

[40]
J. Zheng, D. Zhang, Y. Li, and G. Li, Accelerate packet classification using GPU: A case study on HiCuts, in Computer Science and its Applications, J. Park, I. Stojmenovic, H. Jeong, and G. Yi, eds. Berlin, Germany: Springer, 2015, pp. 231–238.
DOI
[41]
Y. R. Qu, H. H. Zhang, S. Zhou, and V. K. Prasanna, Optimizing many-field packet classification on FPGA, multi-core general purpose processor, and GPU, in Proc. Eleventh ACM/IEEE Symp. Architectures for Networking and Communications Systems, Oakland, CA, USA, 2015, pp. 87–98.
[42]
M. Abbasi and M. Rafiee, A calibrated asymptotic framework for analyzing packet classification algorithms on GPUs, J. Supercomput., vol. 75, pp. 6574−6611, 2019.
DOI
[43]
D. S. Henty, Performance of hybrid message-passing and shared-memory parallelism for discrete element modeling, in Proc. ACM/IEEE Conf. Supercomputing, Dallas, TX, USA, 2000, p. 10.
DOI
[44]

J. Hutter and A. Curioni, Dual-level parallelism for ab initio molecular dynamics: Reaching teraflop performance with the CPMD code, Parallel Comput., vol. 31, no. 1, pp. 1–17, 2005.

[45]
F. Cappello and D. Etiemble, MPI versus MPI+OpenMP on the IBM SP for the NAS benchmarks, in Proc. ACM/IEEE Conf. Supercomputing, Dallas, TX, USA, 2000, p. 12.
DOI
[46]
M. Ferretti and L. Santangelo, Hybrid OpenMP-MPI parallelism: Porting experiments from small to large clusters, in Proc. 26th Euromicro Int. Conf. Parallel, Distributed and Network-Based Processing (PDP), Cambridge, UK, 2018, pp. 297–301.
DOI
[47]

Y. Y. Jiao, Q. Zhao, L. Wang, G. H. Huang, and F. Tan, A hybrid MPI/OpenMP parallel computing model for spherical discontinuous deformation analysis, Comput. Geotech., vol. 106, pp. 217–227, 2019.

[48]

M. Abbasi, A. Shokrollahi, M. R. Khosravi, and V. G. Menon, High-performance flow classification using hybrid clusters in software defined mobile edge computing, Comput. Commun., vol. 160, pp. 643–660, 2020.

[49]
D. Jacobsen, J. Thibault, and I. Senocak, An MPI-CUDA implementation for massively parallel incompressible flow computations on multi-GPU clusters, in Proc. 48th AIAA Aerospace Sciences Meeting Including the New Horizons Forum and Aerospace Exposition, Orlando, FL, USA, 2010, p. 522.
DOI
[50]
V. V. Kindratenko, J. J. Enos, G. Shi, M. T. Showerman, G. W. Arnold, J. E. Stone, J. C. Phillips, and W. M. Hwu, GPU clusters for high-performance computing, in Proc. IEEE Int. Conf. Cluster Computing and Workshops, New Orleans, LA, USA, 2009, pp. 1–8.
DOI
[51]

F. Lu, J. Song, F. Yin, and X. Zhu, Performance evaluation of hybrid programming patterns for large CPU/GPU heterogeneous clusters, Comput. Phys. Commun., vol. 183, no. 6, pp. 1172–1181, 2012.

[52]
F. D. Sacerdoti, S. Chandra, and K. Bhatia, Grid systems deployment & management using rocks, in Proc. IEEE Int. Conf. Cluster Computing, San Diego, CA, USA, 2004, pp. 337–345.
[53]

D. E. Taylor and J. S. Turner, Classbench: A packet classification benchmark, IEEE/ACM Trans. Netw., vol. 15, no. 3, pp. 499–511, 2007.

[54]

L. Qi, Y. Yang, X. Zhou, W. Rafique, and J. Ma, Fast anomaly identification based on multiaspect data streams for intelligent intrusion detection toward secure industry 4.0, IEEE Trans. Industr. Inform., vol. 18, no. 9, pp. 6503–6511, 2022.

[55]
Z. Li, X. Xu, T. Hang, H. Xiang, Y. Cui, L. Qi, and X. Zhou, A knowledge-driven anomaly detection framework for social production system, IEEE Trans. Comput. Soc. Syst..
[56]

X. Zhou, X. Yang, J. Ma, and K. I. K. Wang, Energy-efficient smart routing based on link correlation mining for wireless edge computing in IoT, Internet Intern. Things J., vol. 9, no. 16, pp. 14988–14997, 2022.

[57]

M. Abbasi, A. Najafi, M. Rafiee, M. R. Khosravi, V. G. Menon, and G. Muhammad, Efficient flow processing in 5G-envisioned SDN-based internet of vehicles using GPUs, IEEE Trans. Intell. Transp. Syst., vol. 22, no. 8, pp. 5283–5292, 2021.

Publication history
Copyright
Rights and permissions

Publication history

Received: 29 May 2023
Revised: 18 July 2023
Accepted: 06 August 2023
Published: 09 February 2024
Issue date: August 2024

Copyright

© The Author(s) 2024.

Rights and permissions

The articles published in this open access journal are distributed under the terms of theCreative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return