AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
Article Link
Collect
Submit Manuscript
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Regular Paper

Combining KNN with AutoEncoder for Outlier Detection

State Key Laboratory of Software Development Environment, Beihang University, Beijing 100191, China
School of Software, Shandong University, Jinan 250100, China
Joint SDU-NTU Centre for Artificial Intelligence Research, Shandong University, Jinan 250100, China
School of Computer Science, Jiangsu University of Science and Technology, Zhenjiang 212003, China
Show Author Information

Abstract

K-nearest neighbor ( KNN) is one of the most fundamental methods for unsupervised outlier detection because of its various advantages, e.g., ease of use and relatively high accuracy. Currently, most data analytic tasks need to deal with high-dimensional data, and the KNN-based methods often fail due to “the curse of dimensionality”. AutoEncoder-based methods have recently been introduced to use reconstruction errors for outlier detection on high-dimensional data, but the direct use of AutoEncoder typically does not preserve the data proximity relationships well for outlier detection. In this study, we propose to combine KNN with AutoEncoder for outlier detection. First, we propose the Nearest Neighbor AutoEncoder (NNAE) by persevering the original data proximity in a much lower dimension that is more suitable for performing KNN. Second, we propose the K-nearest reconstruction neighbors ( KNRNs) by incorporating the reconstruction errors of NNAE with the K-distances of KNN to detect outliers. Third, we develop a method to automatically choose better parameters for optimizing the structure of NNAE. Finally, using five real-world datasets, we experimentally show that our proposed approach NNAE+ KNRN is much better than existing methods, i.e., KNN, Isolation Forest, a traditional AutoEncoder using reconstruction errors (AutoEncoder-RE), and Robust AutoEncoder.

Electronic Supplementary Material

Download File(s)
JCST-2204-12403-Highlights.pdf (452.6 KB)

References

[1]
Aggarwal C C. Outlier analysis. In Data Mining, Aggarwal C C (ed.), Springer, 2015. DOI: 10.1007/978-3-319-14142-8_8.
[2]

Chandola V, Banerjee A, Kumar V. Anomaly detection: A survey. ACM Computing Surveys, 2009, 41(3): Article No. 15. DOI: 10.1145/1541880.1541882.

[3]
Zhou C, Paffenroth R C. Anomaly detection with robust deep autoencoders. In Proc. the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2017, pp.665–674. DOI: 10.1145/3097983.3098052.
[4]
Liu F T, Ting K M, Zhou Z H. Isolation forest. In Proc. the 8th IEEE International Conference on Data Mining, Dec. 2008, pp.413–422. DOI: 10.1109/ICDM.2008.17.
[5]
Sequeira K, Zaki M. ADMIT: Anomaly-based data mining for intrusions. In Proc. the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Jul. 2002, pp.386–395. DOI: 10.1145/775047.775103.
[6]
Eskin E, Arnold A, Prerau M, Portnoy L, Stolfo S. A geometric framework for unsupervised anomaly detection. In Applications of Data Mining in Computer Security, Barbará D, Jajodia S (eds.), Springer, 2002, pp.77–101. DOI: 10.1007/978-1-4615-0953-0_4.
[7]
An J, Cho S. Variational autoencoder based anomaly detection using reconstruction probability. Technical Report, Data Mining Center of Seoul National University, 2015. https://paperswithcode.com/paper/variational-autoencoder-based-anomaly, Sept. 2024.
[8]
Angiulli F, Pizzuti C. Fast outlier detection in high dimensional spaces. In Proc. the 6th European Conference on Principles of Data Ming and Knowledge Discovery, Aug. 2002, pp.15–26. DOI: 10.1007/3-540-45681-3_2.
[9]
Idé T, Kashima H. Eigenspace-based anomaly detection in computer systems. In Proc. the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2004, pp.440–449. DOI: 10.1145/1014052.1014102.
[10]
Hu R J, Aggarwal C C, Ma S, Huai J P. An embedding approach to anomaly detection. In Proc. the 32nd IEEE International Conference on Data Engineering, May 2016, pp.385–396. DOI: 10.1109/ICDE.2016.7498256.
[11]
Zhu M X, Aggarwal C C, Ma S, Zhang H, Huai J P. Outlier detection in sparse data with factorization machines. In Proc. the 2017 ACM Conference on Information and Knowledge Management, Nov. 2017, pp.817–826. DOI: 10.1145/3132847.3132987.
[12]

Ng A. Sparse autoencoder. CS294A Lecture Notes, 2011, 72(2011): 1–19. https://graphics.stanford.edu/courses/cs233-21-spring/ReferencedPapers/SAE.pdf/, Sept. 2024.

[13]
Chen J H, Sathe S, Aggarwal C, Turaga D. Outlier detection with autoencoder ensembles. In Proc. the 2017 SIAM International Conference on Data Mining, Apr. 2017, pp.90–98. DOI: 10.1137/1.9781611974973.11.
[14]
Zhang C X, Song D J, Chen Y C, Feng X Y, Lumezanu C, Cheng W, Ni J C, Zong B, Chen H F, Chawla N V. A deep neural network for unsupervised anomaly detection and diagnosis in multivariate time series data. In Proc. the 33rd AAAI Conference on Artificial Intelligence, Jan. 27–Feb. 1, 2019, pp.1409–1416. DOI: 10.1609/aaai.v33i01.33011409.
[15]
Pang G S, Cao L B, Aggarwal C. Deep learning for anomaly detection: Challenges, methods, and opportunities. In Proc. the 14th ACM International Conference on Web Search and Data Mining, Mar. 2021, pp.1127–1130. DOI: 10.1145/3437963.3441659.
[16]
Ruff L, Zemlyanskiy Y, Vandermeulen R, Schnake T, Kloft M. Self-attentive, multi-context one-class classification for unsupervised anomaly detection on text. In Proc. the 57th Annual Meeting of the Association for Computational Linguistics, Jul. 2019, pp.4061–4071. DOI: 10.18653/v1/p19-1398.
[17]
Bergmann P, Fauser M, Sattlegger D, Steger C. MVTec AD—A comprehensive real-world dataset for unsupervised anomaly detection. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.9592–9600. DOI: 10.1109/CVPR.2019.00982.
[18]
Gong D, Liu L Q, Le V, Saha B, Mansour M R, Venkatesh S, Van Den Hengel A. Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision, Oct. 27–Nov. 2, 2019, pp.1705–1714. DOI: 10.1109/ICCV.2019.00179.
[19]
Hou J L, Zhang Y Y, Zhong Q Y, Xie D, Pu S L, Zhou H. Divide-and-assemble: Learning block-wise memory for unsupervised anomaly detection. In Proc. the 2021 IEEE/CVF International Conference on Computer Vision, Oct. 2021, pp.8771–8780. DOI: 10.1109/ICCV48922.2021.00867.
[20]
Chen X H, Deng L W, Huang F T, Zhang C W, Zhang Z Q, Zhao Y, Zheng K. DAEMON: Unsupervised anomaly detection and interpretation for multivariate time series. In Proc. the 37th IEEE International Conference on Data Engineering, Apr. 2021, pp.2225–2230. DOI: 10.1109/ICDE51399.2021.00228.
[21]
Lai C H, Zou D M, Lerman G. Robust subspace recovery layer for unsupervised anomaly detection. In Proc. the 8th International Conference on Learning Representations, Apr. 2020.
[22]
Chen W X, Xu H W, Li Z Y, Pei D, Chen J, Qiao H L, Feng Y, Wang Z G. Unsupervised anomaly detection for intricate KPIs via adversarial training of VAE. In Proc. the 2019 IEEE Conference on Computer Communications, Apr. 29–May 2, 2019, pp.1891–1899. DOI: 10.1109/INFOCOM.2019.8737430.
[23]
Audibert J, Michiardi P, Guyard F, Marti S, Zuluaga M A. USAD: UnSupervised anomaly detection on multivariate time series. In Proc. the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Aug. 2020, pp.3395–3404. DOI: 10.1145/3394486.3403392.
[24]
Putina A, Sozio M, Rossi D, Navarro J M. Random histogram forest for unsupervised anomaly detection. In Proc. the 2020 IEEE International Conference on Data Mining, Nov. 2020, pp.1226–1231. DOI: 10.1109/ICDM50108.2020.00154.
[25]

Mohotti W A, Nayak R. Efficient outlier detection in text corpus using rare frequency and ranking. ACM Trans. Knowledge Discovery from Data, 2020, 14(6): Article No. 71. DOI: 10.1145/3399712.

[26]
Li J, Di S M, Shen Y Y, Chen L. FluxEV: A fast and effective unsupervised framework for time-series anomaly detection. In Proc. the 14th ACM International Conference on Web Search and Data Mining, Mar. 2021, pp.824–832. DOI: 10.1145/3437963.3441823.
[27]
Hawkins D M. Identification of Outliers. Springer, 1980.
[28]
Mahoney M V, Chan P K. Learning rules for anomaly detection of hostile network traffic. In Proc. the 3rd IEEE International Conference on Data Mining, Nov. 2003, pp.601–604. DOI: 10.1109/ICDM.2003.1250987.
[29]
Tandon G, Chan P K. Weighting versus pruning in rule validation for detecting network and host anomalies. In Proc. the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2007, pp.697–706. DOI: 10.1145/1281192.1281267.
[30]
Ramaswamy S, Rastogi R, Shim K. Efficient algorithms for mining outliers from large data sets. In Proc. the 2000 ACM SIGMOD International Conference on Management of Data, May 2000, pp.427–438. DOI: 10.1145/342009.335437.
[31]
Hautamäki V, Kärkkäinen I, Fränti P. Outlier detection using k-nearest neighbour graph. In Proc. the 17th International Conference on Pattern Recognition, Aug. 2004, pp.430–433. DOI: 10.1109/ICPR.2004.1334558.
[32]
Jagadish H V, Koudas N, Muthukrishnan S. Mining deviants in a time series database. In Proc. the 25th International Conference on Very Large Data Bases, Sept. 1999, pp.102–113.
[33]

Zimek A, Schubert E, Kriegel H P. A survey on unsupervised outlier detection in high-dimensional numerical data. Statistical Analysis and Data Mining, 2012, 5(5): 363–387. DOI: 10.1002/sam.11161.

[34]

Lee Y J, Yeh Y R, Wang Y C F. Anomaly detection via online oversampling principal component analysis. IEEE Trans. Knowledge and Data Engineering, 2013, 25(7): 1460–1470. DOI: 10.1109/TKDE.2012.99.

[35]

Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks. Science, 2006, 313(5786): 504–507. DOI: 10.1126/science.1127647.

[36]
Borg I, Groenen P J F. Modern Multidimensional Scaling: Theory and Applications (2nd edition). Springer, 2005.
[37]
Xiong L, Chen X, Schneider J. Direct robust matrix factorizatoin for anomaly detection. In Proc. the 11th IEEE International Conference on Data Mining, Dec. 2011, pp.844–853. DOI: 10.1109/ICDM.2011.52.
[38]
Hawkins S, He H X, Williams G, Baxter R. Outlier detection using replicator neural networks. In Proc. the 4th International Conference, Sept. 2002, pp.170–180. DOI: 10.1007/3-540-46145-0_17.
[39]
Sakurada M, Yairi T. Anomaly detection using autoencoders with nonlinear dimensionality reduction. In Proc. the 2nd Workshop on Machine Learning for Sensory Data Analysis, Dec. 2014, pp.4–11. DOI: 10.1145/2689746.2689747.
[40]
Aytekin C, Ni X Y, Cricri F, Aksu E. Clustering and unsupervised anomaly detection with l2 normalized deep auto-encoder representations. In Proc. the 2018 International Joint Conference on Neural Networks, Jul. 2018, pp.1–6. DOI: 10.1109/IJCNN.2018.8489068.
[41]
Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol P A. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. The Journal of Machine Learning Research, 2010, 11: 3371–3408. DOI: 10.5555/1756006.1953039.
[42]
Li Y, Fang B X, Guo L, Chen Y. Network anomaly detection based on TCM-KNN algorithm. In Proc. the 2nd ACM Symposium on Information, Computer and Communications Security, Mar. 2007, pp.13–19. DOI: 10.1145/1229285.1229292.
[43]
Wu G J, Zhao Z H, Fu G, Wang H P, Wang Y, Wang Z Y, Hou J T, Huang L. A fast kNN-based approach for time sensitive anomaly detection over data streams. In Proc. the 19th International Conference on Computational Science, Jun. 2019, pp.59–74. DOI: 10.1007/978-3-030-22741-8_5.
[44]
Goldstein M, Uchida S. A comparative study on outlier removal from a large-scale dataset using unsupervised anomaly detection. In Proc. the 5th International Conference on Pattern Recognition Applications and Methods, Feb. 2016, pp.263–269. DOI: 10.5220/0005701302630269.
[45]
Xiao H, Rasul K, Vollgraf R. Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. arXiv: 1708.07747, 2017. http://arxiv.org/abs/1708.07747, Aug. 2024.
[46]
Coates A, Ng A Y, Lee H. An analysis of single-layer networks in unsupervised feature learning. In Proc. the 14th International Conference on Artificial Intelligence and Statistics, Apr. 2011, pp.215–223.
[47]
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In Proc. the 3rd International Conference on Learning Representations, May 2015.
[48]
Duan L, Aggarwal C C, Ma S, Sathe S. Improving spectral clustering with deep embedding and cluster estimation. In Proc. the 2019 IEEE International Conference on Data Mining, Nov. 2019, pp.170–179. DOI: 10.1109/ICDM.2019.00027.
[49]

Duan L, Ma S, Aggarwal C, Sathe S. Improving spectral clustering with deep embedding, cluster estimation and metric learning. Knowledge and Information Systems, 2021, 63(3): 675–694. DOI: 10.1007/s10115-020-01530-8.

Journal of Computer Science and Technology
Pages 1153-1166

{{item.num}}

Comments on this article

Go to comment

< Back to all reports

Review Status: {{reviewData.commendedNum}} Commended , {{reviewData.revisionRequiredNum}} Revision Required , {{reviewData.notCommendedNum}} Not Commended Under Peer Review

Review Comment

Close
Close
Cite this article:
Liu S-Z, Ma S, Chen H-Q, et al. Combining KNN with AutoEncoder for Outlier Detection. Journal of Computer Science and Technology, 2024, 39(5): 1153-1166. https://doi.org/10.1007/s11390-023-2403-y

204

Views

0

Crossref

0

Web of Science

0

Scopus

0

CSCD

Altmetrics

Received: 12 April 2022
Accepted: 13 September 2023
Published: 05 December 2024
© Institute of Computing Technology, Chinese Academy of Sciences 2024