AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
View PDF
Submit Manuscript AI Chat Paper
Show Outline
Show full outline
Hide outline
Show full outline
Hide outline
Open Access

Efficient Feature Extraction Using Apache Spark for Network Behavior Anomaly Detection

Xiaoming YeXingshu Chen( )Dunhu LiuWenxian WangLi YangGang LiangGuolin Shao
School of Cybersecurity, Chengdu University of Information Technology, Chengdu 610225
College of Cybersecurity, Sichuan University, Chengdu 610065, China.
School of Management, Chengdu University of Information Technology, Chengdu 610103, China.
College of Compute Science, Sichuan University, Chengdu 610065, China.
Show Author Information


Extracting and analyzing network traffic feature is fundamental in the design and implementation of network behavior anomaly detection methods. The traditional network traffic feature method focuses on the statistical features of traffic volume. However, this approach is not sufficient to reflect the communication pattern features. A different approach is required to detect anomalous behaviors that do not exhibit traffic volume changes, such as low-intensity anomalous behaviors caused by Denial of Service/Distributed Denial of Service (DoS/DDoS) attacks, Internet worms and scanning, and BotNets. We propose an efficient traffic feature extraction architecture based on our proposed approach, which combines the benefit of traffic volume features and network communication pattern features. This method can detect low-intensity anomalous network behaviors and conventional traffic volume anomalies. We implemented our approach on Spark Streaming and validated our feature set using labelled real-world dataset collected from the Sichuan University campus network. Our results demonstrate that the traffic feature extraction approach is efficient in detecting both traffic variations and communication structure changes. Based on our evaluation of the MIT-DRAPA dataset, the same detection approach utilizes traffic volume features with detection precision of 82.3% and communication pattern features with detection precision of 89.9%. Our proposed feature set improves precision by 94%.


K. Xu, F. Wang, and L. Gu, Behavior analysis of internet traffic via bipartite graphs and one-mode projections, IEEE/ACM Trans. Netw., vol. 22, no. 3, pp. 931-942, 2014.
A. Sperotto, R. Sadre, P. T. Boer, and A. Pras, Hidden Markov model modeling of SSH brute-force attacks, in Proc. 20th IFIP/IEEE Int. Workshop on Distributed Systems: Operations and Management: Integrated Management of Systems Services Processes and People in IT, Venice, Italy, 2009, pp. 164-176.
K. Huang, Z. W. Qi, and B. Liu, Network anomaly detection based on statistical approach and time series analysis, in Proc. 23th Int. Conf. Advanced Information Networking and Applications Workshops, Bradford, UK, 2009, pp. 205-211.
T. Andrysiak, Ł Saganowski, M. Choraś, and R. Kozik, Network traffic prediction and anomaly detection based on ARFIMA model, in Proc. Int. Joint Conf. SOCO’14-CISIS’14-ICEUTE’14, Bilbao, Spain, 2014, pp. 545-554.
M. M. Ding and H. Tian, PCA-based network traffic anomaly detection, Tsinghua Sci. Technol., vol. 21, no. 5, pp. 500-509, 2016.
X. M. Ye, X. S. Chen, H. Z. Wang, X. M. Zeng, G. L. Shao, X. Y. Yin, and C. Xu, An anomalous behavior detection model in Cloud Computing, Tsinghua Sci. Technol., vol. 21, no. 3, pp. 322-332, 2016.
W. Willinger, M. S. Taqqu, R. Sherman, and D. V. Wilson, Self-similarity through high-variability: Statistical analysis of Ethernet LAN traffic at the source level, IEEE/ACM Trans. Netw., vol. 5, no. 1, pp. 71-86, 1997.
T. Babaie, S. Chawla, and S. Ardon, Network traffic decomposition for anomaly detection, Computer Science, vol. 96, no. 2, pp. 201-212, 2014.
P. Winter, H. Lampesberger, M. Zeilinger, and E. Hermann, On detecting abrupt changes in network entropy time series, in Proc. 12th IFIP TC 6/TC 11 Int. Conf. Communications and Multimedia Security, Ghent, Belgium, 2011, pp. 194-205.
W. E. Leland, M. S. Taqqu, W. Willinger, and D. V. Wilson, On the self-similar nature of Ethernet traffic (extended version), IEEE/ACM Trans. Netw., vol. 2, no. 1, pp. 1-15, 1994.
M. Iliofotou, M. Faloutsos, and M. Mitzenmacher, Exploiting dynamicity in graph-based traffic analysis: Techniques and applications, in Proc. 5th Int. Conf. Emerging Networking Experiments and Technologies, Rome, Italy, 2009, pp. 241-252.
L. Akoglu, H. H. Tong, and D. Koutra, Graph based anomaly detection and description: A survey, Data Min. Knowl. Discov., vol. 29, no. 3, pp. 626-688, 2015.
D. Q. Le, T. Jeong, H. E. Roman, and J. W. K. Hong, Traffic dispersion graph based anomaly detection, in Proc. 2nd Symp. on Information and Communication Technology, Hanoi, Vietnam, 2011, pp. 36-41.
M. S. Rahman, T. K. Huang, H. V. Madhyastha, and M. Faloutsos, Efficient and scalable socware detection in online social networks, in Proc. 21st USENIX Conf. Security Symp., Bellevue, WA, USA, 2012, p. 32.
U. Khurana, S. Parthasarathy, and D. Turaga, Graph–based exploration of non-graph datasets, Proc. VLDB Endow., vol. 9, no. 13, pp. 1557-1560, 2016.
C. R. Harshaw, R. A. Bridges, M. D. Iannacone, J. W. Reed, and J. R. Goodall, GraphPrints: Towards a graph analytic method for network anomaly detection, in Proc. 11th Annu. Cyber and Information Security Research Conf., Oak Ridge, TN, USA, 2016, pp. 1-4.
J. François, S. N. Wang, R. D. State, and T. Engel, BotTrack: Tracking botnets using NetFlow and PageRank, in Proc. 10th Int. IFIP TC 6 Conf. Networking, Valencia, Spain, 2011, pp. 1-14.
Q. Ding, N. Katenka, P. Barford, E. Kolaczyk, and M. Crovella, Intrusion as (anti)social communication: Characterization and detection, in Proc. 18th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Beijing, China, 2012, pp. 886-894.
S. Weigert, M. A. Hiltunen, and C. Fetzer, Community-based analysis of netflow for early detection of security incidents, in Proc. 25th Int. Conf. Large Installation System Administration, Boston, MA, USA, 2011, p. 20.
K. Ishibashi, T. Kondoh, S. Harada, T. Mori, R. Kawahara, and S. Asano, Detecting anomalous traffic using communication graphs, in Telecommunications: The Infrastructure for the 21st Century, Vienna, Austria, 2010, pp. 1-6.
Z. M. Chen, K. Y. Chai, S. L. F. Bu, and C. T. Lau, Combining MIC feature selection and feature-based MSPCA for network traffic anomaly detection, in Proc. 3rd Int. Conf. on Digital Information Processing, Data Mining, and Wireless Communications, Moscow, Russia, 2016, pp. 176-181.
J. Tan, X. S. Chen, M. Du, and K. Zhu, A novel internet traffic identification approach using wavelet packet decomposition and neural network, J. Cent. South Univ., vol. 19, no. 8, pp. 2218-2230, 2012.
S. R. Kundu, S. Pal, K. Basu, and S. K. Das, Fast classification and estimation of Internet traffic flows, in Proc. 8th Int. Conf. Passive and Active Network Measurement, Louvainla-Neuve, Belgium, 2007, pp. 155-164.
P. Barford and D. Plonka, Characteristics of network traffic flow anomalies, in Proc. 1st ACM SIGCOMM Workshop on Internet Measurement, San Francisco, CA, USA, 2001, pp. 69-73.
H. Bunke, P. J. Dickinson, M. Kraetzl, and W. D. Wallis, A graph-theoretic approach to enterprise network dynamics, Progress in Computer Science and Applied Logic, vol. 24, pp. 63-78, 2007.
M. Iliofotou, H. C. Kim, M. Faloutsos, M. Mitzenmacher, P. Pappu, and G. Varghese, Graption: A graph–based P2P traffic classification framework for the internet backbone, Comput. Netw., vol. 55, no. 8, pp. 1909-1920, 2011.
C. Chaparro and C. Eberle, Detecting anomalies in mobile telecommunication networks using a graph based approach, in Proc. 28th Int. Florida Artificial Intelligence Research Society Conf., Hollywood, FL, USA, 2015, pp. 410-415.
A. Sanfeliu and K. S. Fu, A distance measure between attributed relational graphs for pattern recognition, IEEE Trans. Syst. Man. Cybern., vol. 13, no. 3, pp. 353-362, 1983.
L. Mookiah, W. Eberle, and L. Holder, Detecting suspicious behavior using a graph-based approach, in Proc IEEE Conf. Visual Analytics Science and Technology, Paris, France, 2014, pp. 357-358.
J. Lin, E. Keogh, S. Lonardi, and B. Chiu, A symbolic representation of time series, with implications for streaming algorithms, in Proc. 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, San Diego, CA, USA, 2003, pp. 2-11.
E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehrotra, Dimensionality reduction for fast similarity search in large time series databases, Knowl. Inf. Syst., vol. 3, no. 3, pp. 263-286, 2001.
T. Karagiannis, M. Molle, and M. Faloutsos, Longrange dependence ten years of Internet traffic modeling, IEEE Internet Comput., vol. 8, no. 5, pp. 57-64, 2004.
S I. Tadaki, Long-term power-law fluctuation in Internet traffic, J. Phys. Soc. Jpn., vol. 76, no. 4, p. 044001, 2007.
G. Samorodnitsky, Long range dependence, Found. Trends Stoch. Syst., vol. 1, no. 3, pp. 163-257, 2007.
M. V. Mahoney and P. K. Chan, An analysis of the 1999 DARPA/Lincoln laboratory evaluation data for network anomaly detection, Recent Advances in Intrusion Detection, vol. 1, no. 1, pp. 220-237, 2003.
Tsinghua Science and Technology
Pages 561-573
Cite this article:
Ye X, Chen X, Liu D, et al. Efficient Feature Extraction Using Apache Spark for Network Behavior Anomaly Detection. Tsinghua Science and Technology, 2018, 23(5): 561-573.








Web of Science






Received: 24 September 2017
Accepted: 29 September 2017
Published: 17 September 2018
© The author(s) 2018