Journal Home > Volume 1 , Issue 1

Owing to the explosive growth of Internet traffic, network operators must be able to monitor the entire network situation and efficiently manage their network resources. Traditional network analysis methods that usually work on a single machine are no longer suitable for huge traffic data owing to their poor processing ability. Big data frameworks, such as Hadoop and Spark, can handle such analysis jobs even for a large amount of network traffic. However, Hadoop and Spark are inherently designed for offline data analysis. To cope with streaming data, various stream-processing-based frameworks have been proposed, such as Storm, Flink, and Spark Streaming. In this study, we propose an online Internet traffic monitoring system based on Spark Streaming. The system comprises three parts, namely, the collector, messaging system, and stream processor. We considered the TCP performance monitoring as a special use case of showing how network monitoring can be performed with our proposed system. We conducted typical experiments with a cluster in standalone mode, which showed that our system performs well for large Internet traffic measurement and monitoring.


menu
Abstract
Full text
Outline
About this article

Online Internet Traffic Monitoring System Using Spark Streaming

Show Author's information Baojun ZhouJie Li( )Xiaoyan WangYu GuLi XuYongqiang HuLihua Zhu
Department of Computer Science, University of Tsukuba, Tsukuba 305-8577, Japan.
College of Engineering, Ibaraki University, Hitachi 316-8511, Japan.
School of Computer and Information, Hefei University of Technology, Hefei 230601, China.
College of Mathematics and Computer Science, Fujian Normal University, Fuzhou 350007, China.
Institute of Scientific and Technical Information of Qinghai, Xining 810008, China.

Abstract

Owing to the explosive growth of Internet traffic, network operators must be able to monitor the entire network situation and efficiently manage their network resources. Traditional network analysis methods that usually work on a single machine are no longer suitable for huge traffic data owing to their poor processing ability. Big data frameworks, such as Hadoop and Spark, can handle such analysis jobs even for a large amount of network traffic. However, Hadoop and Spark are inherently designed for offline data analysis. To cope with streaming data, various stream-processing-based frameworks have been proposed, such as Storm, Flink, and Spark Streaming. In this study, we propose an online Internet traffic monitoring system based on Spark Streaming. The system comprises three parts, namely, the collector, messaging system, and stream processor. We considered the TCP performance monitoring as a special use case of showing how network monitoring can be performed with our proposed system. We conducted typical experiments with a cluster in standalone mode, which showed that our system performs well for large Internet traffic measurement and monitoring.

Keywords: big data, spark streaming, network monitoring, TCP performance monitoring

References(21)

[1]
Cisco Visual Networking Index, Forecast and methodology, 2016-2021, White Paper, San Jose, CA, USA: Cisco, 2016.
[2]
Lee Y., Kang W., and Son H., An Internet traffic analysis method with MapReduce, in Proc. 2010 IEEE/IFIP Network Operations and Management Symposium Workshops (NOMS Wksps), Osaka, Japan, 2010, pp. 357-361.
DOI
[3]
Brauckhoff D., Tellenbach B., Wagner A., May M., and Lakhina A., Impact of packet sampling on anomaly detection metrics, in Proc. 6th ACM SIGCOMM Conf. Int. Measurement, Rio de Janeriro, Brazil, 2006, pp. 159-164.
DOI
[4]
Qiao Y. Y., Lei Z. M., Yuan L., and Guo M. J., Offline traffic analysis system based on Hadoop, J. China Univ. Posts Telecommun., vol. 20, no. 5, pp. 97-103, 2013.
[5]
[6]
Kambatla K., Kollias G., Kumar V., and Grama A., Trends in big data analytics, J. Parallel Distrib. Comput., vol. 74, no. 7, pp. 2561-2573, 2014.
[7]
Apache Spark, http://spark.apache.org/, 2017.
[8]
Zaharia M., Chowdhury M., Franklin M. J., Shenker S., and Stoica I., Spark: Cluster computing with working sets, in Proc. 2nd USENIX Conf. Hot Topics in Cloud Computing, Boston, MA, USA, 2010, p. 10.
[9]
Liu J., Liu F., and Ansari N., Monitoring and analyzing big traffic data of a large-scale cellular network with Hadoop, IEEE Netw., vol. 28, no. 4, pp. 32-39, 2014.
[10]
Lee Y. and Lee Y., Toward scalable internet traffic measurement and analysis with Hadoop, ACM SIGCOMM Comput. Commun. Rev., vol. 43, no. 1, pp. 5-13, 2013.
[11]
Chen Z. J., Xu G. B., Mahalingam V., Ge L. Q., Nguyen J., Yu W., and Lu C., A cloud computing based network monitoring and threat detection system for critical infrastructures, Big Data Res., vol. 3, pp. 10-23, 2016.
[12]
Gupta A., Birkner R., Canini M., Feamster N., Mac-Stoker C., and Willinger W., Network monitoring as a streaming analytics problem, in Proc. 15th ACM Workshop on Hot Topics in Networks, Atlanta, GA, USA, 2016, pp. 106-112.
DOI
[13]
Karimi A. M., Niyaz Q., Sun W. Q., Javaid A. Y., and Devabhaktuni V. K., Distributed network traffic feature extraction for a real-time IDS, in Proc.2016 IEEE Int. Conf. Electro Information Technology (EIT), Grand Forks, ND, USA, 2016, pp. 522-526.
DOI
[14]
Chen C. L. P. and Zhang C. Y., Data-intensive applications, challenges, techniques and technologies: A survey on big data, Inf. Sci., vol. 275, pp. 314-347, 2014.
[15]
Shahrivari S., Beyond batch processing: Towards real-time and streaming big data, Computers, vol. 3, no. 4, pp. 117-129, 2014.
[16]
Paxson V., Bro: A system for detecting network intruders in real-time, Comput. Netw., vol. 31, nos. 23&24, pp. 2435-2463, 1999.
[17]
Roesch M., Snort-lightweight intrusion detection for networks, in Proc. 13th USENIX Conf. System Administration, Seattle, WA, USA, 1999, pp. 229-238.
[18]
Suricata, https://suricata-ids.org/, 2017 .
[19]
[20]
[21]
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 11 August 2017
Accepted: 30 November 2017
Published: 25 January 2018
Issue date: March 2018

Copyright

© The author(s) 2018

Acknowledgements

This work was partially supported by Grant-in-Aid for Scientific Research from Japan Society for Promotion of Science (JSPS), Qinghai Joint Research Grant (No. 2016-HZ-804), and Research Collaboration Grant from NII, Japan.

Rights and permissions

Return