Entity resolution (ER) is a significant task in data integration, which aims to detect all entity profiles that correspond to the same real-world entity. Due to its inherently quadratic complexity, blocking was proposed to ameliorate ER, and it offers an approximate solution which clusters similar entity profiles into blocks so that it suffices to perform pairwise comparisons inside each block in order to reduce the computational cost of ER. This paper presents a comprehensive survey on existing blocking technologies. We summarize and analyze all classic blocking methods with emphasis on different blocking construction and optimization techniques. We find that traditional blocking ER methods which depend on the fixed schema may not work in the context of highly heterogeneous information spaces. How to use schema information flexibly is of great significance to efficiently process data with the new features of this era. Machine learning is an important tool for ER, but end-to-end and efficient machine learning methods still need to be explored. We also sum up and provide the most promising trend for future work from the directions of real-time blocking ER, incremental blocking ER, deep learning with ER, etc.
- Article type
- Year
- Co-author
Numerous applications of recommender systems can provide us a tool to understand users. A group recommender reflects the analysis of multiple users’ behavior, and aims to provide each user of the group with the things they involve according to users’ preferences. Currently, most of the existing group recommenders ignore the interaction among the users. However, in the course of group activities, the interactive preferences will dramatically affect the success of recommenders. The problem becomes even more challenging when some unknown preferences of users are partly influenced by other users in the group. An interaction-based method named GRIP (Group Recommender Based on Interactive Preference) is presented which can use group activity history information and recommender post-rating feedback mechanism to generate interactive preference parameters. To evaluate the performance of the proposed method, it is compared with traditional collaborative filtering on the MovieLens dataset. The results indicate the superiority of the GRIP recommender for multi-users regarding both validity and accuracy.