Journal Home > Volume 18 , Issue 3

Collaborative filtering solves information overload problem by presenting personalized content to individual users based on their interests, which has been extensively applied in real-world recommender systems. As a class of simple but efficient collaborative filtering method, similarity based approaches make predictions by finding users with similar taste or items that have been similarly chosen. However, as the number of users or items grows rapidly, the traditional approach is suffering from the data sparsity problem. Inaccurate similarities derived from the sparse user-item associations would generate the inaccurate neighborhood for each user or item. Consequently, its poor recommendation drives us to propose a Threshold based Similarity Transitivity (TST) method in this paper. TST firstly filters out those inaccurate similarities by setting an intersection threshold and then replaces them with the transitivity similarity. Besides, the TST method is designed to be scalable with MapReduce framework based on cloud computing platform. We evaluate our algorithm on the public data set MovieLens and a real-world data set from AppChina (an Android application market) with several well-known metrics including precision, recall, coverage, and popularity. The experimental results demonstrate that TST copes well with the tradeoff between quality and quantity of similarity by setting an appropriate threshold. Moreover, we can experimentally find the optimal threshold which will be smaller as the data set becomes sparser. The experimental results also show that TST significantly outperforms the traditional approach even when the data becomes sparser.


menu
Abstract
Full text
Outline
About this article

TST: Threshold Based Similarity Transitivity Method in Collaborative Filtering with Cloud Computing

Show Author's information Feng XieZhen Chen( )Hongfeng XuXiwei FengQi Hou
Department of Automation, Research Institute of Information Technology and Tsinghua National Laboratory for Information Science and Technology (TNList), Tsinghua University, Beijing 100084, China
Research Institute of Information Technology and Tsinghua National Laboratory for Information Science and Technology (TNList), Tsinghua University, Beijing 100084, China
Department of Computer Science and Technologies and Tsinghua National Laboratory for Information Science and Technology (TNList), Tsinghua University, Beijing 100084, China
Department of Electronic Engineering and Tsinghua National Laboratory for Information Science and Technology (TNList), Tsinghua University, Beijing 100084, China

Abstract

Collaborative filtering solves information overload problem by presenting personalized content to individual users based on their interests, which has been extensively applied in real-world recommender systems. As a class of simple but efficient collaborative filtering method, similarity based approaches make predictions by finding users with similar taste or items that have been similarly chosen. However, as the number of users or items grows rapidly, the traditional approach is suffering from the data sparsity problem. Inaccurate similarities derived from the sparse user-item associations would generate the inaccurate neighborhood for each user or item. Consequently, its poor recommendation drives us to propose a Threshold based Similarity Transitivity (TST) method in this paper. TST firstly filters out those inaccurate similarities by setting an intersection threshold and then replaces them with the transitivity similarity. Besides, the TST method is designed to be scalable with MapReduce framework based on cloud computing platform. We evaluate our algorithm on the public data set MovieLens and a real-world data set from AppChina (an Android application market) with several well-known metrics including precision, recall, coverage, and popularity. The experimental results demonstrate that TST copes well with the tradeoff between quality and quantity of similarity by setting an appropriate threshold. Moreover, we can experimentally find the optimal threshold which will be smaller as the data set becomes sparser. The experimental results also show that TST significantly outperforms the traditional approach even when the data becomes sparser.

Keywords: big data, machine learning, cloud computing, collaborative filtering, data mining, mapReduce, recommender systems, similarity transitivity, android applications

References(38)

[1]
P. Resnick and H. R. Varian, Recommender systems, Communications of the ACM, vol. 40, no. 3, pp. 56-58, 1997.
[2]
G. Adomavicius and A. Tuzhilin, Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions, IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 6, pp. 734-749, 2005.
[3]
M. Balabanovic and Y. Shoham, Fab: content-based, collaborative recommendation, Communications of the ACM, vol. 40, no. 3, pp. 66-72, 1997.
[4]
M. J. Pazzani and D. Billsus, Content-based recommendation systems, The Adaptive Web. Heidelberg: Springer Berlin, 2007, pp. 325-341.
[5]
D. Goldberg, D. Nichols, B. M. Oki, and D. Terry, Using collaborative filtering to weave an information tapestry. Communications of the ACM, vol. 35, no. 12, pp. 61-70, 1992.
[6]
J. B. Schafer, D. Frankowski, J. Herlocker, and S. Sen, Collaborative filtering recommender systems, The Adaptive Web. Heidelberg: Springer Berlin, 2007, pp. 291-324.
[7]
X. Su and T. M. Khoshgoftaar, A survey of collaborative filtering techniques, Advances in Artificial Intelligence, vo. 2009, pp. 1-19.
[8]
C. Christakou, S. Vrettos, and A. Stafylopatis, A hybrid movie recommender system based on neural networks, International Journal on Artificial Intelligence Tools, vol. 16, no. 5, pp. 771-792, 2007.
[9]
B. Yang, T. Mei, X. S. Hua, L. Yang, S. Q. Yang, and M. Li, Online video recommendation based on multimodal fusion and relevance feedback. in Proceedings of the 6th ACM international conference on Image and video retrieval, Amsterdam, Netherlands, 2007, pp. 73-80.
DOI
[10]
M. Van Setten, M. Veenstra, A. Nijholt, and B. van Dijk, Prediction strategies in a TV recommender system-method and experiments. in Proceedings of the Second IADIS International Conference WWW/Internet, Algarve, Portugal, 2003, pp. 203-210.
[11]
J. Park, S. J. Lee, S. J. Lee, K. Kim, B. S. Chung, and Y. K. Lee, Online video recommendation through tag-cloud aggregation, IEEE MultiMedia, vol. 18, no. 1, pp. 78-86, 2011.
[12]
M. Balabanovic, Exploring versus exploiting when learning user models for text recommendation, User Modeling and User-Adapted Interaction, vol. 8, no. 1-2, pp. 71-102, 1998.
[13]
G. Linden, B. Smith, and J. York, Amazon. com recommendations: Item-to-item collaborative filtering, IEEE Internet Computing, vol. 7, no. 1, pp. 76-80, 2003.
[14]
T. Hofmann, Latent semantic models for collaborative filtering, ACM Transactions on Information Systems (TOIS), vol. 22, no. 1, pp. 89-115, 2004.
[15]
K. Miyahara, and M. J. Pazzani, Collaborative filtering with the simple Bayesian classifier, PRICAI 2000 Topics in Artificial Intelligence. Heidelberg: Springer Berlin, 2000, pp. 679-689.
DOI
[16]
X. Su and T. M. Khoshgoftaar, Collaborative filtering for multi-class data using belief nets algorithms, in Proceedings of 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI’06), Washington DC, USA, 2006, pp. 497-504.
DOI
[17]
G. Shani, D. Heckerman, and R. I. Brafman, An MDP-based recommender system, Journal of Machine Learning Research, vol. 6, no. 2, pp. 1265-1295, 2006.
[18]
B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, Analysis of recommendation algorithms for e-commerce, in Proceedings of the 2nd ACM conference on Electronic commerce, Minneapolis, MN, USA, 2000, pp. 158-167.
DOI
[19]
H. Ma, T. C. Zhou, M. R. Lyu, and I. King, Improving recommender systems by incorporating social contextual information, ACM Transactions on Information Systems (TOIS), vol. 29, no. 2, pp. 1-23, 2011.
[20]
F. Xie, M. Xu, and Z.Chen, RBRA: A simple and efficient rating-based recommender algorithm to cope with sparsity in recommender systems, in Procedings of 26th International Conference on Advanced Information Networking and Applications Workshops (WAINA), Fukuoka, Japan, 2012, pp. 306-311.
DOI
[21]
B. M. Sarwar, G. Karypis, J. A. Konstan, and J. Riedl, Application of dimensionality reduction in recommender systems-a case study, in Proceedings of 6th SIGKDD Workshop on Web Mining and Web Usage Analysis (WebKDD’00), Boston, MA, USA, 2000.
DOI
[22]
K. Goldberg, T. Roeder, D. Gupta, and C. Perkins, Eigentaste: A constant time collaborative filtering algorithm, Information Retrieval, vol. 4, no. 2, pp. 133-151, 2001.
[23]
B Sarwar, G Karypis, J Konstan, and J. Riedl, Incremental singular value decomposition algorithms for highly scalable recommender systems, in Procedings of Fifth International Conference on Computer and Information Science, 2002.
[24]
L. H. Ungar, and D. P. Foster, Clustering methods for collaborative filtering, in Procedings of AAAI Workshop on Recommendation Systems, Madison, isconsin, USA, 1998.
[25]
S. H. S. Chee, J. Han, and K. Wang, Rectree: An efficient collaborative filtering method, Data Warehousing and Knowledge Discovery, Springer Berlin Heidelberg, pp. 141-151, 2001.
[26]
Z. Huang, D. Zeng, and H. Chen, A comparative study of recommendation algorithms in e-commerce applications, IEEE Intelligent Systems, vol. 22, no. 5, pp. 68-78, 2007.
[27]
T. Zhou, J. Ren, M. Medo, and Y. C. Zhang, Bipartite network projection and personal recommendation, Physical Review E, vol. 76, no. 4, 046115, 2007.
[28]
X. Li, and H. Chen, Recommendation as link prediction in bipartite graphs: A graph kernel-based machine learning approach, Decision Support Systems, vol. 54, no. 2, pp. 880-890, 2012.
[29]
J. G. Liu, T. Zhou, H. A. Che, B. H. Wang, and Y. C. Zhang, Effects of high-order correlations on personalized recommendations for bipartite networks, Physica A: Statistical Mechanics and its Applications, vol. 389, no.4, pp. 881-886, 2010.
[30]
J. Dean, and S. Ghemawat, MapReduce: simplified data processing on large clusters, Communications of the ACM, vol. 51, no. 1, pp. 107-113, 2008.
[31]
J. L. Herlocker, J. A. Konstan, L. G. Terveen, and J. Riedl, Evaluating collaborative filtering recommender systems, ACM Transactions on Information Systems (TOIS), vol. 22, no. 1, pp. 5-53, 2004.
[32]
J. L. Herlocker, J. A. Konstan, A. Borchers, and J. Riedl, An algorithmic framework for performing collaborative filtering, in Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, Berkeley, CA, USA, 1999, pp. 230-237.
DOI
[33]
Z. Chen, F. Y. Han, J. W. Cao, X. Jiang, and S. Chen, Cloud computing-based forensic analysis for collaborative network security management system, Tsinghua Science and Technology, vol. 18, no. 1, pp. 40-50, 2013.
[34]
A. Gunawardana, G. Shani, A survey of accuracy evaluation metrics of recommendation tasks, The Journal of Machine Learning Research, vol. 10, pp. 2935-2962, 2009.
[35]
H. Steck, Training and testing of recommender systems on data missing not at random, in Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, Washington DC, USA, 2010, pp. 713-722.
DOI
[36]
H. Steck, Item popularity and recommendation accuracy, in Proceedings of the fifth ACM conference on Recommender systems, Chicago, USA, 2011, pp. 125-132.
DOI
[37]
P. Castells, S. Vargas, and J. Wang, Novelty and diversity metrics for recommender systems: choice, discovery and relevance, in Proceedings of International Workshop on Diversity in Document Retrieval (DDR), Chicago, USA, 2011, pp. 29-37.
DOI
[38]
G. Adomavicius, and Y. O. Kwon, Improving aggregate recommendation diversity using ranking-based techniques, IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 5, pp. 896-911, 2012.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 15 April 2013
Revised: 15 May 2013
Accepted: 15 May 2013
Published: 03 June 2013
Issue date: June 2013

Copyright

© The author(s) 2013

Acknowledgements

The authors would like to thank Prof. Jun Li of NSLAB from RIIT for his careful guidance about the paper’s structure and writing. We are also grateful to Prof. Junwei Cao from RIIT, Dr. Zihong Huang and Xiaoping Feng from Electronic Engineering Department for their help.

This work is supported by Ministry of Science and Technology of China under the National Key Basic Research and Development (973) Program of China (Nos. 2012CB315801 and 2011CB302805), the National Natural Science Foundation of China A3 Program (No. 61161140320) and the National Natural Science Foundation of China (No. 61233016). This work is also supported by Intel Research Council with the title of Security Vulnerability Analysis based on Cloud Platform with Intel IA Architecture.

Rights and permissions

Return