Journal Home > Volume 6 , Issue 1

We postulate and analyze a nonlinear subsampling accuracy loss (SSAL) model based on the root mean square error (RMSE) and two SSAL models based on the mean square error (MSE), suggested by extensive preliminary simulations. The SSAL models predict accuracy loss in terms of subsampling parameters like the fraction of users dropped (FUD) and the fraction of items dropped (FID). We seek to investigate whether the models depend on the characteristics of the dataset in a constant way across datasets when using the SVD collaborative filtering (CF) algorithm. The dataset characteristics considered include various densities of the rating matrix and the numbers of users and items. Extensive simulations and rigorous regression analysis led to empirical symmetrical SSAL models in terms of FID and FUD whose coefficients depend only on the data characteristics. The SSAL models came out to be multi-linear in terms of odds ratios of dropping a user (or an item) vs. not dropping it. Moreover, one MSE deterioration model turned out to be linear in the FID and FUD odds where their interaction term has a zero coefficient. Most importantly, the models are constant in the sense that they are written in closed-form using the considered data characteristics (densities and numbers of users and items). The models are validated through extensive simulations based on 850 synthetically generated primary (pre-subsampling) matrices derived from the 25M MovieLens dataset. Nearly 460 000 subsampled rating matrices were then simulated and subjected to the singular value decomposition (SVD) CF algorithm. Further validation was conducted using the 1M MovieLens and the Yahoo! Music Rating datasets. The models were constant and significant across all 3 datasets.


menu
Abstract
Full text
Outline
About this article

Closed-Form Models of Accuracy Loss due to Subsampling in SVD Collaborative Filtering

Show Author's information Samin Poudel1( )Marwan Bikdash1
Department of Computational Data Science and Engineering, North Carolina A & T State University, Greensboro, NC 27401, USA

Abstract

We postulate and analyze a nonlinear subsampling accuracy loss (SSAL) model based on the root mean square error (RMSE) and two SSAL models based on the mean square error (MSE), suggested by extensive preliminary simulations. The SSAL models predict accuracy loss in terms of subsampling parameters like the fraction of users dropped (FUD) and the fraction of items dropped (FID). We seek to investigate whether the models depend on the characteristics of the dataset in a constant way across datasets when using the SVD collaborative filtering (CF) algorithm. The dataset characteristics considered include various densities of the rating matrix and the numbers of users and items. Extensive simulations and rigorous regression analysis led to empirical symmetrical SSAL models in terms of FID and FUD whose coefficients depend only on the data characteristics. The SSAL models came out to be multi-linear in terms of odds ratios of dropping a user (or an item) vs. not dropping it. Moreover, one MSE deterioration model turned out to be linear in the FID and FUD odds where their interaction term has a zero coefficient. Most importantly, the models are constant in the sense that they are written in closed-form using the considered data characteristics (densities and numbers of users and items). The models are validated through extensive simulations based on 850 synthetically generated primary (pre-subsampling) matrices derived from the 25M MovieLens dataset. Nearly 460 000 subsampled rating matrices were then simulated and subjected to the singular value decomposition (SVD) CF algorithm. Further validation was conducted using the 1M MovieLens and the Yahoo! Music Rating datasets. The models were constant and significant across all 3 datasets.

Keywords:

collaborative filtering, subsampling, accuracy loss models, performance loss, recommendation system, simulation, rating matrix, root mean square error
Received: 13 February 2022 Revised: 09 July 2022 Accepted: 11 July 2022 Published: 24 November 2022 Issue date: March 2023
References(43)
[1]
B. Smith and G. Linden, Two decades of recommender systems at Amazon.com, IEEE Internet Comput., vol. 21, no. 3, pp. 12–18, 2017.
[2]
C. A. Gomez-Uribe and N. Hunt, The Netflix recommender system: Algorithms, business value, and innovation, ACM Trans. Manag. Inf. Syst., vol. 6, no. 4, p. 13, 2015.
[3]
I. Pilászy and D. Tikk, Recommending new movies: Even a few ratings are more valuable than metadata, in Proc. 3rd ACM Conf. on Recommender Systems, New York, NY, USA, 2009, pp. 93–100.
[4]
P. K. Singh, P. K. D. Pramanik, and P. Choudhury, Collaborative filtering in recommender systems: Technicalities, challenges, applications, and research trends, in New Age Analytics, G. Shrivastava, S. L. Peng, H. Bansal, K. Sharma, and M. Sharma, eds. New York, NY, USA: Apple Academic Press, 2020, pp. 183–215.
[5]
J. L. Herlocker, J. A. Konstan, and J. Riedl, Explaining collaborative filtering recommendations, in Proc. 2000 ACM Conf. on Computer Supported Cooperative Work, Philadelphia, PA, USA, 2000, pp. 241–250.
[6]
G. Adomavicius and A. Tuzhilin, Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions, IEEE Trans. Knowl. Data Eng., vol. 17, no. 6, pp. 734–749, 2005.
[7]
Z. Liu, X. Luo, and Z. Wang, Convergence analysis of single latent factor-dependent, nonnegative, and multiplicative update-based nonnegative latent factor models, IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 4, pp. 1737–1749, 2021.
[8]
D. Wu, M. Shang, X. Luo, and Z. Wang, An L1-and-L2-norm-oriented latent factor model for recommender systems, IEEE Trans. Neural Netw. Learn. Syst.,.
[9]
D. Wu, X. Luo, M. Shang, Y. He, G. Wang, and X. Wu, A data-characteristic-aware latent factor model for web services QoS prediction, IEEE Trans. Knowl. Data Eng., vol. 34, no. 6, pp. 2525–2538, 2022.
[10]
X. Luo, Z. Wang, and M. Shang, An instance-frequency-weighted regularization scheme for non-negative latent factor analysis on high-dimensional and sparse data, IEEE Trans. Syst. Man Cybern. Syst., vol. 51, no. 6, pp. 3522–3532, 2021.
[11]
X. Luo, W. Qin, A. Dong, K. Sedraoui, and M. Zhou, Efficient and high-quality recommendations via momentum-incorporated parallel stochastic gradient descent-based learning, IEEE/CAA J. Autom. Sinica, vol. 8, no. 2, pp. 402–411, 2021.
[12]
Y. Liu, T. A. N. Pham, G. Cong, and Q. Yuan, An experimental evaluation of point-of-interest recommendation in location-based social networks, Proceedings VLDB Endowment, vol. 10, no. 10, pp. 1010–1021, 2017.
[13]
G. Adomavicius and J. Zhang, Impact of data characteristics on recommender systems performance, ACM Trans. Manag. Inf. Syst., vol. 3, no. 1, p. 3, 2012.
[14]
A. Bellogín and A. P. de Vries, Understanding similarity metrics in neighbour-based recommender systems, in Proc. Conf. on the Theory of Information Retrieval, Copenhagen, Denmark, 2013, pp. 48–55.
[15]
C. Desrosiers and G. Karypis, A comprehensive survey of neighborhood-based recommendation methods, in Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor, eds. New York, NY, USA: Springer, 2011, pp. 107–144.
DOI
[16]
F. Cacheda, V. Carneiro, D. Fernández, and V. Formoso, Comparison of collaborative filtering algorithms: Limitations of current techniques and proposals for scalable, high-performance recommender systems, ACM Trans. Web, vol. 5, no. 1, p. 2, 2011.
[17]
M. A. Ghazanfar and A. Prugel-Bennett, The advantage of careful imputation sources in sparse data-environment of recommender systems: Generating improved SVD-based recommendations, Informatica, vol. 37, no. 1, pp. 61–92, 2013.
[18]
V. W. Anelli, T. Di Noia, E. Di Sciascio, C. Pomo, and A. Ragone, On the discriminative power of hyper-parameters in cross-validation and how to choose them, in Proc. 13th ACM Conf. on Recommender Systems, Copenhagen, Denmark, 2019, pp. 447–451.
[19]
E. B. Nilsen, D. E. Bowler, and J. D. C. Linnell, Exploratory and confirmatory research in the open science era, J. Appl. Ecol., vol. 57, no. 4, pp. 842–847, 2020.
[20]
J. Lee, M. Sun, and G. Lebanon, A comparative study of collaborative filtering algorithms, arXiv preprint arXiv: 1205.3193, 2012.
[21]
B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, Item-based collaborative filtering recommendation algorithms, in Proc. 10th Int. Conf. on World Wide Web, Hong Kong, China, 2001, pp. 285–295.
[22]
V. H. Vegeborn and H. Rahmani, Comparison and Improvement of Collaborative Filtering Algorithms, Stockholm: KTH, 2017.
[23]
Y. Deldjoo, T. Di Noia, E. Di Sciascio, and F. A. Merra, How dataset characteristics affect the robustness of collaborative recommendation models, in Proc. 43rd Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, 2020, pp. 951–960.
[24]
M. Forster and E. Sober, How to tell when simpler, more unified, or less Ad hoc theories will provide more accurate predictions, Br. J. Philos. Sci., vol. 45, no. 1, pp. 1–35, 1994.
[25]
R. Dubin, Theory Building. New York, NY, USA: Free Press, 1969.
[26]
M. C. Lin, A. J. T. Lee, R. T. Kao, and K. T. Chen, Stock price movement prediction using representative prototypes of financial reports, ACM Trans. Manag. Inf. Syst., vol. 2, no. 3, p. 19, 2011.
[27]
G. Shmueli and O. Koppius, Predictive Analytics in Information Systems Research, College Park: University of Maryland, 2010.
DOI
[28]
S. Poudel and M. Bikdash, Optimal dependence of performance and efficiency of collaborative filtering on random stratified subsampling, Big Data Mining and Analytics, vol. 5, no. 3, pp. 192–205, 2022.
[29]
GroupLens, MovieLens 25M dataset, https://grouplens.org/datasets/movielens/25m/, 2019.
[30]
F. M. Harper and J. A. Konstan, The movielens datasets: History and context, ACM Trans. Interact. Intell. Syst., vol. 5, no. 4, p. 19, 2016.
[31]
G. H. Golub, and C. Reinsch, Singular value decomposition and least squares solutions, in Linear Algebra, J. H. Wilkinson and C. Reinsch, eds. Berlin, Heidelberg, Germany: Springer, 1971, pp. 134–151.
DOI
[32]
N. Hug, Surprise: A python library for recommender systems, J. Open Source Softw., vol. 5, no. 52, p. 2174, 2020.
[33]
G. Shani and A. Gunawardana, Evaluating recommendation systems, in Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira and P. B. Kantor, eds. New York, NY, USA: Springer, 2011, pp. 257–297.
DOI
[34]
G. Schröder, M. Thiele, and W. Lehner, Setting goals and choosing metrics for recommender system evaluations, in Proc. Workshop on User-Centric Evaluation of Recommender Systems and Their Interfaces, Chicago, IL, USA, 2011, pp. 78–85.
[35]
S. Poudel, A study of disease diagnosis using machine learning, presented at the 2nd Int. Electronic Conf. on Healthcare, Basel, Switzerland, 2022.
[36]
M. Jalili, S. Ahmadian, M. Izadi, P. Moradi, and M. Salehi, Evaluating collaborative filtering recommender algorithms: A survey, IEEE Access, vol. 6, pp. 74003–74024, 2018.
[37]
S. Poudel, Improving collaborative filtering recommendation systems via optimal sub-sampling and aspect-based interpretability, PhD dissertation, North Carolina Agricultural and Technical State University, Greensboro, NC, USA, 2022.
[38]
T. Chai and R. R. Draxler, Root mean square error (RMSE) or mean absolute error (MAE)?–Arguments against avoiding RMSE in the literature, Geosci. Model Dev., vol. 7, no. 3, pp. 1247–1250, 2014.
[39]
J. Frost, Mean squared error (MSE), https://statisticsbyjim.com/regression/mean-squared-error-mse/, 2022.
[40]
GroupLens, MovieLens 1M dataset, https://grouplens.org/datasets/movielens/1m/, 2003.
[41]
[42]
E. C. Alexopoulos, Introduction to multivariate regression analysis, Hippokratia, vol. 14, no. Suppl 1, pp. 23–28, 2010.
[43]
K. Kumari and S. Yadav, Linear regression analysis study, J. Pract. Cardiovasc. Sci., vol. 4, no. 1, pp. 33–36, 2018.
Publication history
Copyright
Rights and permissions

Publication history

Received: 13 February 2022
Revised: 09 July 2022
Accepted: 11 July 2022
Published: 24 November 2022
Issue date: March 2023

Copyright

© The author(s) 2023.

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return