Journal Home > Volume 25 , Issue 2

An increasing number of websites are making use of HTTPS encryption to enhance security and privacy for their users. However, HTTPS encryption makes it very difficult to identify the service over HTTPS flows, which poses challenges to network security management. In this paper we present DTA-HOC, a novel DNS-based two-level association HTTPS traffic online service identification method for large-scale networks, which correlates HTTPS flows with DNS flows using big data stream processing and association technologies to label the service in an HTTPS flow with a specific associated domain name. DTA-HOC has been specifically designed to address three practical challenges in the service identification process: domain name ambiguity, domain name query invisibility, and data association time window size contradictions. Several experiments on datasets collected from a 10-Gbps campus network are conducted alongside offline and online testing. Results show that DTA-HOC can achieve an average online association rate on HTTPS traffic of 83% and a generic accuracy of 86.16%. Its processing time for one minute of data is less than 20 seconds. These results indicate that DTA-HOC is an efficient method for online identification of services in HTTPS flows for large-scale networks. Moreover, our proposed method can contribute to the identification of other applications which make a Domain Name System (DNS) communication before establishing a connection.


menu
Abstract
Full text
Outline
About this article

DTA-HOC: Online HTTPS Traffic Service Identification Using DNS in Large-Scale Networks

Show Author's information Xuemei ZengXingshu Chen( )Guolin ShaoTao HeLina Wang
Cybersecurity Research Institute, Sichuan University, Chengdu 610065, China.
College of Cybersecurity, Sichuan University, Chengdu 610065, China.
College of Computer Science, Sichuan University, Chengdu 610065, China.

Abstract

An increasing number of websites are making use of HTTPS encryption to enhance security and privacy for their users. However, HTTPS encryption makes it very difficult to identify the service over HTTPS flows, which poses challenges to network security management. In this paper we present DTA-HOC, a novel DNS-based two-level association HTTPS traffic online service identification method for large-scale networks, which correlates HTTPS flows with DNS flows using big data stream processing and association technologies to label the service in an HTTPS flow with a specific associated domain name. DTA-HOC has been specifically designed to address three practical challenges in the service identification process: domain name ambiguity, domain name query invisibility, and data association time window size contradictions. Several experiments on datasets collected from a 10-Gbps campus network are conducted alongside offline and online testing. Results show that DTA-HOC can achieve an average online association rate on HTTPS traffic of 83% and a generic accuracy of 86.16%. Its processing time for one minute of data is less than 20 seconds. These results indicate that DTA-HOC is an efficient method for online identification of services in HTTPS flows for large-scale networks. Moreover, our proposed method can contribute to the identification of other applications which make a Domain Name System (DNS) communication before establishing a connection.

Keywords: big data, HTTPS, Domain Name System (DNS), service identification, flow association

References(18)

[1]
W. B. Pan, G. Cheng, X. J. Guo, and S. X. Huang, Review and perspective on encrypted traffic identification research, (in Chinese), J. Commun., vol. 37, no. 9, pp. 154–167, 2016.
[2]
G. Gebhart, We’re halfway to encrypting the entire web, https://www.eff.org/deeplinks/2017/02/were-halfway-encrypting-entire-web, 2017.
[3]
[4]
P. Velan, M. Čermák, P. Čeleda, and M. Drašar, A survey of methods for encrypted traffic classification and analysis, Int. J. Netw. Manage., vol. 25, no. 5, pp. 355–374, 2015.
[5]
Z. G. Cao, G. Xiong, Y. Zhao, Z. Z. Li, and L. Guo, A survey on encrypted traffic classification, in Proc. 5th Int. Conf. Applications and Techniques in Information Security, Melbourne, Australia, 2014, pp. 73–81.
DOI
[6]
D. Plonka and P. Barford, Flexible traffic and host profiling via DNS rendezvous, in Proc. Workshop on Securing and Trusting Internet Names, Cambridge, UK, 2011, pp. 1–8.
[7]
M. Trevisan, I. Drago, M. Mellia, and M. M. Munafò, Towards web service classification using addresses and DNS, in Proc. 12th Int. Wireless Communications and Mobile Computing Conf., Paphos, Cyprus, 2016, pp. 38–43.
DOI
[8]
Sphirewall, http://www.sphirewall.net/, 2018.
[9]
IPFire, http://www.ipfire.org/, 2018.
[10]
W. M. Shbair, T. Cholez, A. Goichot, and I. Chrisment, Efficiently bypassing SNI-based HTTPS filtering, in Proc. 2015 IFIP/IEEE Int. Symp. Integrated Network Management, Ottawa, Canada, 2015, pp. 990–995.
DOI
[11]
N. Kang, Research on fingerprint extrantion and identification of HTTPS web traffic, (in Chinese), Master dissertation, Harbin Institute of Technology, Harbin, China, 2017.
[12]
W. M. Shbair, T. Cholez, J. François, and I. Chrisment, Improving SNI-based HTTPS security monitoring, in Proc. 36th Int. Conf. Distributed Computing Systems, Nara, Japan, 2016, pp. 72–77.
DOI
[13]
I. N. Bermudez, M. Mellia, M. M. Munafò, R. Keralapura, and A. Nucci, DNS to the rescue: Discerning content and services in a tangled web, in Proc. 2012 Internet Measurement Conf., Boston, MA, USA, 2012, pp. 413–426.
DOI
[14]
P. Foremski, C. Callegari, and M. Pagano, DNS-Class: Immediate classification of IP flows using DNS, Int. J. Netw. Manage., vol. 24, no. 4, pp. 272–288, 2014.
[15]
T. Mori, T. Inoue, A. Shimoda, K. Sato, K. Ishibashi, and S. Goto, SFMap: Inferring services over encrypted web flows using dynamical domain name graphs, in Proc. 7th Int. Workshop on Traffic Monitoring and Analysis, Barcelona, Spain, 2015, pp. 126–139.
DOI
[16]
V. Gehlen, A. Finamore, M. Mellia, and M. M. Munafò, Uncovering the big players of the web, in Proc. 4th Int. Workshop on Traffic Monitoring and Analysis, Vienna, Austria, 2012, pp. 15–28.
DOI
[17]
T. Callahan, M. Allman, and M. Rabinovich, On modern DNS behavior and properties, ACM SIGCOMM Comput. Commun. Rev., vol. 43, no. 3, pp. 7–15, 2013.
[18]
IEBlog, Internet explorer and connection limits, https://blogs.msdn.microsoft.com/ie/2005/04/11/internet-explorer-and-connection-limits/, 2005.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 07 October 2018
Revised: 28 January 2019
Accepted: 04 March 2019
Published: 02 September 2019
Issue date: April 2020

Copyright

© The author(s) 2020

Acknowledgements

This work was partially funded by the National Natural Science Foundation of China (No. 61802270), National Entrepreneurship & Innovation Demonstration Base of China (No. C700011), Key Research & Development Project of Sichuan Province of China (No. 2018GZ0100), and Fundamental Research Business Fee Basic Research Project of Central Universities (No. 2017SCU11065).

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return