TSYS School of Computer Science, Columbus State University, Columbus, GA31907, USA
Department of Computer Science, Illinois Institute of Technology, Chicago, IL60616, USA
Show Author Information
Hide Author Information
Abstract
An effective method to detect stepping-stone intrusion (SSI) is to estimate the length of a connection chain. This type of detection method is referred to as a network-based detection approach. Existing network-based SSI detection methods are either ineffective in the context of the Internet because of the presence of outliers in the packet round-trip times (RTTs) or inefficient, as many packets must be captured and processed. Because of the high fluctuation caused by the intermediate routers on the Internet, it is unavoidable that the RTTs of the captured packets contain outlier values. In this paper, we first propose an efficient algorithm to eliminate most of the possible RTT outliers of the packets captured in the Internet environment. We then develop an efficient SSI detection algorithm by mining network traffic using an improved version of k-Means clustering. Our proposed detection algorithm for SSI is accurate, effective, and efficient in the context of the Internet. Well-designed network experiments are conducted in the Internet environment to verify the effectiveness, correctness, and efficiency of our proposed algorithms. Our experiments show that the effective rate of our proposed SSI detection algorithm is higher than 85.7% in the context of the Internet.
No abstract is available for this article. Click the button above to view the PDF directly.
References
[1]
A.Blum, D.Song, and S.Venkataraman, Detection of interactive stepping-stones: Algorithms and confidence bounds, in Proceedings of International Symposium on Recent Advance in Intrusion Detection (RAID), Sophia Antipolis, France, 2004, pp. 20-35.
B.Mathew, UNIX security: Threats and solutions, in Proc. of Invited Talk Given at the 1995 System Administration, Networking, and Security Conference, Washington, DC, USA, 1995, pp. 6-36.
Y.Chen and S.Wang, A novel network flow watermark embedding model for efficient detection of stepping-stone intrusion based on entropy, in Proceedings of the International Conference on e-Learning, e-Business, Enterprise Information Systems, and e-Government, Las Vegas, NV, USA, 2016.
[4]
Y.Zhang and V.Paxson, Detecting stepping-stones, in Proc. of the 9th USENIX Security Symposium, Denver, CO, USA, 2000, pp. 67-81.
[5]
D.Bhattacherjee, Stepping-stone detection for tracing attack sources in software-defined networks, in Degree Project in Electrical Engineering, Stockholm, Sweden, 2016.
[6]
D.Donoho, A.Flesia, U.Shankar, V.Paxson, J.Coit, and S.Staniford, Multiscale stepping-stone detection: Detecting pairs of jittered interactive streams by exploiting maximum tolerable delay, in Proc. of the 5th International Symposium on Recent Advances in Intrusion Detection, Lecture Notes in Computer Science, Zurich, Switzerland, 2002.
X.Wang and D.Reeves, Robust correlation of encrypted attack traffic through stepping-stones by flow watermarking, IEEE Transactions on Depend-able and Secure Computing, vol. 8, no. 3, pp. 434-449, 2011.
S.Staniford-Chen, and L. T.Heberlein, Holding intruders accountable on the internet, in Proc. of IEEE Symposium on Security and Privacy, Oakland, CA, USA, 1995, pp. 39-49.
[9]
W.Ding, M. J.Hausknecht, S.-H. S.Huang, and Z.Riggle, Detecting stepping-stone intruders with long connection chains, in Proc. of the Fifth International Conference on Information Assurance and Security, Zurich, Switzerland, 2009.
J.Yang, B.Lee, S. S.-H.Huang, Monitoring network traffic to detect stepping-stone intrusion, in Proc. of the 22nd IEEE International Conference on Advanced Information Networking and Applications (AINA 2008), Okinawa, Japan, 2008, pp. 56-61.
S. S.-HHuang, H.Zhang, and M.Phay, Detecting stepping-stone intruders by identifying crossover packets in ssh connections, in Proc. of the 30th IEEE International Conference on Advanced Information Networking and Applications, Fukuoka, Japan, 2016, pp. 1043-1050.
S. S.-H.Huang, R.Lychev, and J.Yang, Stepping-stone detection via request-response traffic analysis, in Proc. of the 4th IEEE International Conference on Automatic and Trusted Computing, Hong Kong, China, 2007, pp. 276-285.
K.Yoda, H.Etoh, Finding connection chain for tracing intruders, in Proc. of the 6th European Symposium on Research in Computer Security, Toulouse, France, 2000, pp. 31-42.
J.Yang, S.-H. S.Huang, and M. D.Wan, A clustering-partitioning algorithm to find TCP packet round-trip time for intrusion detection, in Proc. of the 20th International Conference on Advanced Information Networking and Applications-Volume 1 (AINA’06), Vienna, Austria, 2006, pp. 231-236.
L.Wang and J.Yang, A research survey in stepping-stone intrusion detection, EURASIP Journal on Wireless Communications and Networking, vol. 276, pp. 1-15, 2018.
K. H.Yung, Detecting long connecting chains of interactive terminal sessions, in Proc. of International Symposium on Recent Advance in Intrusion Detection (RAID), Zurich, Switzerland, 2002, pp. 1-16.
J.Yang and S.-H. S.Huang, A real-time algorithm to detect long connection chains of interactive terminal sessions, in Proceedings of the 3rd ACM International Conference on Information Security (Infosecu’04), Shanghai, China, 2004, pp. 198-203.
J.Yang and S.-H. S.Huang, Matching TCP packets and its application to the detection of long connection chains, in Proceedings of the 19th IEEE International Conference on Advanced Information Networking and Applications (AINA’05), Taipei, China, 2005, pp. 1005-1010.
[19]
J.Yang and S. S.-H.Huang, Mining TCP/IP packets to detect stepping-stone intrusion, Journal of Computers and Security, vol. 26, nos. 7&8, pp. 479-484, 2007.
L.Wang, J.Yang, M.Mccormick, P.-J.Wan, and X.Xu, Detect stepping-stone intrusion by mining network traffic using k-means clustering, in Proc. of 2020 IEEE 39th International Performance Computing and Communications Conference (IPCCC), Austin, TX, USA, 2020, pp. 1-8, .
M. H.Haghighat and J.Li, Intrusion detection system using voting-based neural network, Tsinghua Science and Technology, vol. 26, no. 4, pp. 484-495, 2021.
W.Zhong, N.Yu, and C.Ai, Applying big data based deep learning system to intrusion detection, Big Data Mining and Analytics, vol. 3, no. 3, 181-195, 2020.
A.Guezzaz, Y.Asimi, M.Azrour, and A.Asimi, Mathematical validation of proposed machine learning classifier for heterogeneous traffic and anomaly detection, Big Data Mining and Analytics, vol. 4, no. 1, 18-24, 2021.
Z.Cai, Z.He, X.Guan, and Y.Li, Collective data-sanitization for preventing sensitive information inference attacks in social networks, IEEE Transactions on Dependable and Secure Computing, vol. 15, no. 4, pp. 577-590, 2016.
K.Xu, F.Wang, H.Wang, and B.Yang, Detecting fake news over online social media via domain reputations and content understanding, Tsinghua Science and Technology, vol. 25, no. 1, pp. 20-27, 2019.
Z.Cai and X.Zheng, A private and efficient mechanism for data uploading in smart cyber-physical systems, IEEE Transactions on Network Science and Engineering, vol. 7, no. 2, pp. 766-775, 2018.
X.Zheng and Z.Cai, Privacy-preserved data sharing towards multiple parties in industrial IoTs, IEEE Journal on Selected Areas in Communications, vol. 38, no. 5, pp. 968-979, 2020.
Z.Cai and Z.He, Trading private range counting over big IoT data, in Proc. of the 39th IEEE International Conference on Distributed Computing Systems (ICDCS), IEEE, Dallas, TX, USA, 2019, pp. 144-153.
T.Kanungo, D. M.Mount, N. S.Netanyahu, C. D.Piatko, R.Silverman, and A. Y.Wu, An efficient k-means clustering algorithm: Analysis and implementation, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 881-892, 2002.
Wang L, Yang J, Workman M, et al. Effective Algorithms to Detect Stepping-Stone Intrusion by Removing Outliers of Packet RTTs. Tsinghua Science and Technology, 2022, 27(2): 432-442. https://doi.org/10.26599/TST.2021.9010041
The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).
10.26599/TST.2021.9010041.F001
A sample connection chain.
10.26599/TST.2021.9010041.F002
Distribution of packet RTTs for a connection chain.
4 An Effective Detection Algorithm for SSI
In this section, we present an effective algorithm to detect SSI by mining network traffic using an improved version of the k-Means clustering approach. Our network experiment shows that when (i.e., all the RTTs are partitioned into exactly two clusters), our proposed detection algorithm for SSI works effectively, with an effective rate higher than 85.7% in the Internet environment. In this paper, the k-Means clustering approach is only applied when . The k-Means clustering approach is neither needed nor used in our proposed algorithm for other values of , as it does not work effectively when in the context of the Internet. The input of our proposed detection algorithm for SSI in this section is three datasets that were captured from distinct connection chains with two, three, or four connections. Our proposed detection algorithm can effectively output whether a possible intrusion is underway in the underlying network.
For the general k-Means clustering method, the main idea is to find the appropriate centers, one for each of the clusters. The centers need to be calculated and then updated in a smart way to minimize the final standard derivation calculated based on the updated centers in the last iteration of the algorithm. Refer to Ref. [
32
] regarding the k-Means algorithm and its implementation. A k-Means clustering algorithm is an appropriate option for clustering because of the unique characteristics of TCP packets. Legal applications are well-known for never using three or more intermediate hosts to access a remote server. Therefore, if a target host is accessed via three or more stepping-stones from another host, then it is highly possible that the session is being manipulated by a malicious hacker, and the target system is a potential victim host.
Now, we explain how to use the 2-Means clustering approach to detect SSI effectively. Consider three RTT datasets obtained from connection chains of length two, three, or four in the context of the Internet. We run the 2-Means clustering algorithm on these three RTT datasets. Intuitively, the output generated by the dataset collected from the connection chain of length two would produce the smallest standard derivation. On the basis of this intuition, for these three datasets, we may conclude that the dataset that produced the output with the smallest standard derivation was obtained from the connection chain of length two.
Let be a set of data points. Assume that is the set of centers that were found by the k-Means clustering algorithm. We partition P into disjoint subsets where the data points in are associated with their nearest center in C for each by associating each point in with the nearest center in C. For each , let The standard derivation of the given dataset based on k-Means clustering is defined as the square root of its variance as follows:
Let dataset-1, dataset-2, and dataset-3 denote the three RTT datasets obtained from three connection chains of distinct lengths of two, three, or four, respectively. Our proposed algorithm for SSI detection using the 2-Means clustering method can accurately determine which connection chain(s) are sessions possibly manipulated by malicious hackers. This detection algorithm for SSI is described in Algorithm 2.
Next, we explain Algorithm 2, which is used to detect possible SSI. In Step 1, the 2-Means clustering algorithm is called on dataset-1. The procedure is as follows.
Initially, we randomly select two points in the set as the two centers for the clustering algorithm that executes the steps below:
(1) To generate the two clusters, each point in is scanned and assigned to the cluster center whose distance from the cluster center is the minimum between both cluster centers;
(2) Calculate the standard derivation using the current partition and cluster centers according to Eq. (
2
);
(3) For each cluster, recompute the new center, which is the average of the data points in the cluster;
(4) Use the new cluster centers to reproduce the two clusters following the same procedure described in the above Step (1);
(5) Recompute the standard derivation using the updated partition and cluster centers according to Eq. (
2
);
(6) If then Exit; otherwise, repeat Step (3).
When the 2-Means clustering algorithm exits, the standard derivation is minimized using the cluster centers and partition obtained in the last round of the clustering algorithm. Thus, at the end of the algorithm execution, the standard derivation can no longer be reduced by changing the centers of each cluster. According to Eq. (
2
), the value of is the standard derivation obtained using the two clusters obtained in the last iteration of the 2-Means clustering algorithm.
Similarly, the 2-Means clustering algorithm is called on dataset-2 and dataset-3, respectively, at Steps 2 and 3 of the algorithm. Then, the standard derivations and are in turn calculated.
If is the minimum among the three values and then dataset-2 and dataset-3 are sessions possibly manipulated by malicious hackers. Similarly, if is the minimum, then dataset-1 and dataset-3 are sessions possibly manipulated by malicious hackers. If is the minimum, then dataset-1 and dataset-2 are sessions possibly manipulated by malicious hackers. Therefore, this algorithm can effectively determine which connection chain(s) are sessions possibly manipulated by hackers.
4 An Effective Detection Algorithm for SSI
In this section, we present an effective algorithm to detect SSI by mining network traffic using an improved version of the k-Means clustering approach. Our network experiment shows that when (i.e., all the RTTs are partitioned into exactly two clusters), our proposed detection algorithm for SSI works effectively, with an effective rate higher than 85.7% in the Internet environment. In this paper, the k-Means clustering approach is only applied when . The k-Means clustering approach is neither needed nor used in our proposed algorithm for other values of , as it does not work effectively when in the context of the Internet. The input of our proposed detection algorithm for SSI in this section is three datasets that were captured from distinct connection chains with two, three, or four connections. Our proposed detection algorithm can effectively output whether a possible intrusion is underway in the underlying network.
For the general k-Means clustering method, the main idea is to find the appropriate centers, one for each of the clusters. The centers need to be calculated and then updated in a smart way to minimize the final standard derivation calculated based on the updated centers in the last iteration of the algorithm. Refer to Ref. [
32
] regarding the k-Means algorithm and its implementation. A k-Means clustering algorithm is an appropriate option for clustering because of the unique characteristics of TCP packets. Legal applications are well-known for never using three or more intermediate hosts to access a remote server. Therefore, if a target host is accessed via three or more stepping-stones from another host, then it is highly possible that the session is being manipulated by a malicious hacker, and the target system is a potential victim host.
Now, we explain how to use the 2-Means clustering approach to detect SSI effectively. Consider three RTT datasets obtained from connection chains of length two, three, or four in the context of the Internet. We run the 2-Means clustering algorithm on these three RTT datasets. Intuitively, the output generated by the dataset collected from the connection chain of length two would produce the smallest standard derivation. On the basis of this intuition, for these three datasets, we may conclude that the dataset that produced the output with the smallest standard derivation was obtained from the connection chain of length two.
Let be a set of data points. Assume that is the set of centers that were found by the k-Means clustering algorithm. We partition P into disjoint subsets where the data points in are associated with their nearest center in C for each by associating each point in with the nearest center in C. For each , let The standard derivation of the given dataset based on k-Means clustering is defined as the square root of its variance as follows:
Let dataset-1, dataset-2, and dataset-3 denote the three RTT datasets obtained from three connection chains of distinct lengths of two, three, or four, respectively. Our proposed algorithm for SSI detection using the 2-Means clustering method can accurately determine which connection chain(s) are sessions possibly manipulated by malicious hackers. This detection algorithm for SSI is described in Algorithm 2.
Next, we explain Algorithm 2, which is used to detect possible SSI. In Step 1, the 2-Means clustering algorithm is called on dataset-1. The procedure is as follows.
Initially, we randomly select two points in the set as the two centers for the clustering algorithm that executes the steps below:
(1) To generate the two clusters, each point in is scanned and assigned to the cluster center whose distance from the cluster center is the minimum between both cluster centers;
(2) Calculate the standard derivation using the current partition and cluster centers according to Eq. (
2
);
(3) For each cluster, recompute the new center, which is the average of the data points in the cluster;
(4) Use the new cluster centers to reproduce the two clusters following the same procedure described in the above Step (1);
(5) Recompute the standard derivation using the updated partition and cluster centers according to Eq. (
2
);
(6) If then Exit; otherwise, repeat Step (3).
When the 2-Means clustering algorithm exits, the standard derivation is minimized using the cluster centers and partition obtained in the last round of the clustering algorithm. Thus, at the end of the algorithm execution, the standard derivation can no longer be reduced by changing the centers of each cluster. According to Eq. (
2
), the value of is the standard derivation obtained using the two clusters obtained in the last iteration of the 2-Means clustering algorithm.
Similarly, the 2-Means clustering algorithm is called on dataset-2 and dataset-3, respectively, at Steps 2 and 3 of the algorithm. Then, the standard derivations and are in turn calculated.
If is the minimum among the three values and then dataset-2 and dataset-3 are sessions possibly manipulated by malicious hackers. Similarly, if is the minimum, then dataset-1 and dataset-3 are sessions possibly manipulated by malicious hackers. If is the minimum, then dataset-1 and dataset-2 are sessions possibly manipulated by malicious hackers. Therefore, this algorithm can effectively determine which connection chain(s) are sessions possibly manipulated by hackers.
10.26599/TST.2021.9010041.F003
Connection chain of length 4 (from the sensor host S1 to the victim’s host).
10.26599/TST.2021.9010041.T001All notations used in this paper.
Notation
Meaning
a random variable
mean of a random variable
standard derivation
the number of clusters
the center of the -th clusters, where
C
the set of centers
Dataset-
a dataset of RTT values, where or 3
10.26599/TST.2021.9010041.T002Standard derivations outputted by the 2-Means clustering algorithm (k = 2) on the seven captured datasets.
Dataset
Standard derivation
2 connections
3 connections
4 connections
Dataset-1
3184.3
11 785.8
17 431.3
Dataset-2
20 645.9
2146.2
1542.9
Dataset-3
2138.0
16 407.4
5326.3
Dataset-4
20 744.7
27 048.5
74 598.9
Dataset-5
22 709.1
26 717.7
163 099.7
Dataset-6
25 121.4
26 071.2
85 997.0
Dataset-7
23 549.5
25 468.2
84 584.2
Note: Each dataset contains three captured files of RTTs collected from downstream connection chains with 2, 3, or 4 connections, respectively. The RTT dataset we collected from the connection chain of length 2 achieves the smallest standard derivation on six of the seven datasets we captured (only the output generated on dataset-2 is incorrect and in bold face).