Closed-Form Models of Accuracy Loss due to Subsampling in SVD Collaborative Filtering

Samin Poudel; Marwan Bikdash

doi:10.26599/BDMA.2022.9020024

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Journals A - Z

About Us

Publish with Us

Support

PDF (787.6 KB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Open Access

Closed-Form Models of Accuracy Loss due to Subsampling in SVD Collaborative Filtering

Samin Poudel^¹(

), Marwan Bikdash^¹

1Department of Computational Data Science and Engineering, North Carolina A & T State University, Greensboro, NC 27401, USA

Show Author Information

Abstract

We postulate and analyze a nonlinear subsampling accuracy loss (SSAL) model based on the root mean square error (RMSE) and two SSAL models based on the mean square error (MSE), suggested by extensive preliminary simulations. The SSAL models predict accuracy loss in terms of subsampling parameters like the fraction of users dropped (FUD) and the fraction of items dropped (FID). We seek to investigate whether the models depend on the characteristics of the dataset in a constant way across datasets when using the SVD collaborative filtering (CF) algorithm. The dataset characteristics considered include various densities of the rating matrix and the numbers of users and items. Extensive simulations and rigorous regression analysis led to empirical symmetrical SSAL models in terms of FID and FUD whose coefficients depend only on the data characteristics. The SSAL models came out to be multi-linear in terms of odds ratios of dropping a user (or an item) vs. not dropping it. Moreover, one MSE deterioration model turned out to be linear in the FID and FUD odds where their interaction term has a zero coefficient. Most importantly, the models are constant in the sense that they are written in closed-form using the considered data characteristics (densities and numbers of users and items). The models are validated through extensive simulations based on 850 synthetically generated primary (pre-subsampling) matrices derived from the 25M MovieLens dataset. Nearly 460 000 subsampled rating matrices were then simulated and subjected to the singular value decomposition (SVD) CF algorithm. Further validation was conducted using the 1M MovieLens and the Yahoo! Music Rating datasets. The models were constant and significant across all 3 datasets.

Keywords

simulation recommendation system collaborative filtering subsampling performance loss rating matrix accuracy loss models root mean square error

References

[1]

B. Smith and G. Linden, Two decades of recommender systems at Amazon.com, IEEE Internet Comput., vol. 21, no. 3, pp. 12–18, 2017.

Crossref Google Scholar

[2]

C. A. Gomez-Uribe and N. Hunt, The Netflix recommender system: Algorithms, business value, and innovation, ACM Trans. Manag. Inf. Syst., vol. 6, no. 4, p. 13, 2015.

Crossref Google Scholar

[3]

I. Pilászy and D. Tikk, Recommending new movies: Even a few ratings are more valuable than metadata, in Proc. 3^rd ACM Conf. on Recommender Systems, New York, NY, USA, 2009, pp. 93–100.

Crossref Google Scholar

[4]

P. K. Singh, P. K. D. Pramanik, and P. Choudhury, Collaborative filtering in recommender systems: Technicalities, challenges, applications, and research trends, in New Age Analytics, G. Shrivastava, S. L. Peng, H. Bansal, K. Sharma, and M. Sharma, eds. New York, NY, USA: Apple Academic Press, 2020, pp. 183–215.

[5]

J. L. Herlocker, J. A. Konstan, and J. Riedl, Explaining collaborative filtering recommendations, in Proc. 2000 ACM Conf. on Computer Supported Cooperative Work, Philadelphia, PA, USA, 2000, pp. 241–250.

Crossref Google Scholar

[6]

G. Adomavicius and A. Tuzhilin, Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions, IEEE Trans. Knowl. Data Eng., vol. 17, no. 6, pp. 734–749, 2005.

Crossref Google Scholar

[7]

Z. Liu, X. Luo, and Z. Wang, Convergence analysis of single latent factor-dependent, nonnegative, and multiplicative update-based nonnegative latent factor models, IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 4, pp. 1737–1749, 2021.

Crossref Google Scholar

[8]

D. Wu, M. Shang, X. Luo, and Z. Wang, An L1-and-L2-norm-oriented latent factor model for recommender systems, IEEE Trans. Neural Netw. Learn. Syst.,.

Crossref Google Scholar

[9]

D. Wu, X. Luo, M. Shang, Y. He, G. Wang, and X. Wu, A data-characteristic-aware latent factor model for web services QoS prediction, IEEE Trans. Knowl. Data Eng., vol. 34, no. 6, pp. 2525–2538, 2022.

Google Scholar

[10]

X. Luo, Z. Wang, and M. Shang, An instance-frequency-weighted regularization scheme for non-negative latent factor analysis on high-dimensional and sparse data, IEEE Trans. Syst. Man Cybern. Syst., vol. 51, no. 6, pp. 3522–3532, 2021.

Crossref Google Scholar

[11]

X. Luo, W. Qin, A. Dong, K. Sedraoui, and M. Zhou, Efficient and high-quality recommendations via momentum-incorporated parallel stochastic gradient descent-based learning, IEEE/CAA J. Autom. Sinica, vol. 8, no. 2, pp. 402–411, 2021.

Crossref Google Scholar

[12]

Y. Liu, T. A. N. Pham, G. Cong, and Q. Yuan, An experimental evaluation of point-of-interest recommendation in location-based social networks, Proceedings VLDB Endowment, vol. 10, no. 10, pp. 1010–1021, 2017.

Crossref Google Scholar

[13]

G. Adomavicius and J. Zhang, Impact of data characteristics on recommender systems performance, ACM Trans. Manag. Inf. Syst., vol. 3, no. 1, p. 3, 2012.

Crossref Google Scholar

[14]

A. Bellogín and A. P. de Vries, Understanding similarity metrics in neighbour-based recommender systems, in Proc. Conf. on the Theory of Information Retrieval, Copenhagen, Denmark, 2013, pp. 48–55.

Crossref Google Scholar

[15]

C. Desrosiers and G. Karypis, A comprehensive survey of neighborhood-based recommendation methods, in Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor, eds. New York, NY, USA: Springer, 2011, pp. 107–144.

Crossref

[16]

F. Cacheda, V. Carneiro, D. Fernández, and V. Formoso, Comparison of collaborative filtering algorithms: Limitations of current techniques and proposals for scalable, high-performance recommender systems, ACM Trans. Web, vol. 5, no. 1, p. 2, 2011.

Crossref Google Scholar

[17]

M. A. Ghazanfar and A. Prugel-Bennett, The advantage of careful imputation sources in sparse data-environment of recommender systems: Generating improved SVD-based recommendations, Informatica, vol. 37, no. 1, pp. 61–92, 2013.

Google Scholar

[18]

V. W. Anelli, T. Di Noia, E. Di Sciascio, C. Pomo, and A. Ragone, On the discriminative power of hyper-parameters in cross-validation and how to choose them, in Proc. 13^th ACM Conf. on Recommender Systems, Copenhagen, Denmark, 2019, pp. 447–451.

Crossref Google Scholar

[19]

E. B. Nilsen, D. E. Bowler, and J. D. C. Linnell, Exploratory and confirmatory research in the open science era, J. Appl. Ecol., vol. 57, no. 4, pp. 842–847, 2020.

Crossref Google Scholar

[20]

J. Lee, M. Sun, and G. Lebanon, A comparative study of collaborative filtering algorithms, arXiv preprint arXiv: 1205.3193, 2012.

Google Scholar

[21]

B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, Item-based collaborative filtering recommendation algorithms, in Proc. 10th Int. Conf. on World Wide Web, Hong Kong, China, 2001, pp. 285–295.

Crossref Google Scholar

[22]

V. H. Vegeborn and H. Rahmani, Comparison and Improvement of Collaborative Filtering Algorithms, Stockholm: KTH, 2017.

[23]

Y. Deldjoo, T. Di Noia, E. Di Sciascio, and F. A. Merra, How dataset characteristics affect the robustness of collaborative recommendation models, in Proc. 43^rd Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, 2020, pp. 951–960.

Crossref Google Scholar

[24]

M. Forster and E. Sober, How to tell when simpler, more unified, or less Ad hoc theories will provide more accurate predictions, Br. J. Philos. Sci., vol. 45, no. 1, pp. 1–35, 1994.

Crossref Google Scholar

[25]

R. Dubin, Theory Building. New York, NY, USA: Free Press, 1969.

[26]

M. C. Lin, A. J. T. Lee, R. T. Kao, and K. T. Chen, Stock price movement prediction using representative prototypes of financial reports, ACM Trans. Manag. Inf. Syst., vol. 2, no. 3, p. 19, 2011.

Crossref Google Scholar

[27]

G. Shmueli and O. Koppius, Predictive Analytics in Information Systems Research, College Park: University of Maryland, 2010.

Crossref

[28]

S. Poudel and M. Bikdash, Optimal dependence of performance and efficiency of collaborative filtering on random stratified subsampling, Big Data Mining and Analytics, vol. 5, no. 3, pp. 192–205, 2022.

Crossref Google Scholar

[29]

GroupLens, MovieLens 25M dataset, https://grouplens.org/datasets/movielens/25m/, 2019.

[30]

F. M. Harper and J. A. Konstan, The movielens datasets: History and context, ACM Trans. Interact. Intell. Syst., vol. 5, no. 4, p. 19, 2016.

Crossref Google Scholar

[31]

G. H. Golub, and C. Reinsch, Singular value decomposition and least squares solutions, in Linear Algebra, J. H. Wilkinson and C. Reinsch, eds. Berlin, Heidelberg, Germany: Springer, 1971, pp. 134–151.

Crossref

[32]

N. Hug, Surprise: A python library for recommender systems, J. Open Source Softw., vol. 5, no. 52, p. 2174, 2020.

Crossref Google Scholar

[33]

G. Shani and A. Gunawardana, Evaluating recommendation systems, in Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira and P. B. Kantor, eds. New York, NY, USA: Springer, 2011, pp. 257–297.

Crossref

[34]

G. Schröder, M. Thiele, and W. Lehner, Setting goals and choosing metrics for recommender system evaluations, in Proc. Workshop on User-Centric Evaluation of Recommender Systems and Their Interfaces, Chicago, IL, USA, 2011, pp. 78–85.

Google Scholar

[35]

S. Poudel, A study of disease diagnosis using machine learning, presented at the 2^nd Int. Electronic Conf. on Healthcare, Basel, Switzerland, 2022.

Google Scholar

[36]

M. Jalili, S. Ahmadian, M. Izadi, P. Moradi, and M. Salehi, Evaluating collaborative filtering recommender algorithms: A survey, IEEE Access, vol. 6, pp. 74003–74024, 2018.

Crossref Google Scholar

[37]

S. Poudel, Improving collaborative filtering recommendation systems via optimal sub-sampling and aspect-based interpretability, PhD dissertation, North Carolina Agricultural and Technical State University, Greensboro, NC, USA, 2022.

[38]

T. Chai and R. R. Draxler, Root mean square error (RMSE) or mean absolute error (MAE)?–Arguments against avoiding RMSE in the literature, Geosci. Model Dev., vol. 7, no. 3, pp. 1247–1250, 2014.

Crossref Google Scholar

[39]

J. Frost, Mean squared error (MSE), https://statisticsbyjim.com/regression/mean-squared-error-mse/, 2022.

[40]

GroupLens, MovieLens 1M dataset, https://grouplens.org/datasets/movielens/1m/, 2003.

[41]

Webscope | Yahoo labs, https://webscope.sandbox.yahoo.com/catalog.php?datatype=r&did=1, 2019.

[42]

E. C. Alexopoulos, Introduction to multivariate regression analysis, Hippokratia, vol. 14, no. Suppl 1, pp. 23–28, 2010.

Google Scholar

[43]

K. Kumari and S. Yadav, Linear regression analysis study, J. Pract. Cardiovasc. Sci., vol. 4, no. 1, pp. 33–36, 2018.

Crossref Google Scholar

Big Data Mining and Analytics

Volume 6 Issue 1,
March 2023

Pages 72-84

DOI: 10.26599/BDMA.2022.9020024

Cite this article:

Poudel S, Bikdash M. Closed-Form Models of Accuracy Loss due to Subsampling in SVD Collaborative Filtering. Big Data Mining and Analytics, 2023, 6(1): 72-84. https://doi.org/10.26599/BDMA.2022.9020024

580

Views

Downloads

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Altmetrics

Received: 13 February 2022

Revised: 09 July 2022

Accepted: 11 July 2022

Published: 24 November 2022

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).