[1]
Hofmann T., Probabilistic latent semantic analysis, in Proc. 5th Conf. Uncertainty in Artificial Intelligence, Stockholm, Sweden, 1999, pp. 289-296.
[2]
Blei D. M., Ng A. Y., and Jordan M. I., Latent Dirichlet allocation, J. Mach. Learn. Res., vol. 3, pp. 993-1022, 2003.
[3]
Newman D., Asuncion A., Smyth P., and Welling M., Distributed algorithms for topic models, J. Mach. Learn. Res., vol. 10, pp. 1801-1828, 2009.
[4]
Wang Y., Bai H. J., Stanton M., Chen W. Y., and Chang E. Y., PLDA: Parallel latent dirichlet allocation for large-scale applications, in Proc. 5th Int. Conf. Algorithmic Aspects in Information and Management, San Francisco, CA, USA, 2009, pp. 301-314.
[5]
Ahmed A., Aly M., Gonzalez J., Narayanamurthy S., and Smola A. J., Scalable inference in latent variable models, in Proc. 5th ACM Int. Conf. Web Search and Data Mining, Seattle, WA, USA, 2012, pp. 123-132.
[6]
Liu Z. Y., Zhang Y. Z., Chang E. Y., and Sun M. S., PLDA+: Parallel latent dirichlet allocation with data placement and pipeline processing, ACM Trans. Intell. Syst. Technol., vol. 2, no. 3, p. 26, 2011.
[7]
Ho Q. R., Cipar J., Cui H. G., Kim J. K., Lee S., Gibbons P. B., Gibson G. A., Ganger G. R., and Xing E. P., More effective distributed ML via a stale synchronous parallel parameter server, in Proc. 26th Int. Conf. Neural Information Processing Systems, Lake Tahoe, NV, USA, 2013, pp. 1223-1231.
[8]
Yuan J. H., Gao F., Ho Q. R., Dai W., Wei J. L., Zheng X., Xing E. P., Liu T. Y., and Ma W. Y., LightLDA: Big topic models on modest computer clusters, in Proc. 24th Int. Conf. World Wide Web, Florence, Italy, 2015, pp. 1351-1361.
[9]
Xing E. P., Ho Q. R., Dai W., Kim J. K., Wei J. L., Lee S., Zheng X., Xie P. T., Kumar A., and Yu Y. L., Petuum: A new platform for distributed machine learning on big data, in Proc. 21th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Sydney, Australia, 2015, pp. 1335-1344.
[12]
Wang Y., Zhao X. M., Sun Z. L., Yan H., Wang L. F., Jin Z. H., Wang L. B., Gao Y., Law C., and Zeng J., Peacock: Learning long-tail topic features for industrial applications, ACM Trans. Intell. Syst. Technol., vol. 6, no. 4, p. 47, 2015.
[13]
Tora S. and Eguchi K., MPI/OpenMP hybrid parallel inference methods for latent dirichlet allocation—Approximation and evaluation, IEICE Trans. Inf. Syst., vol. E96.D, no. 5, pp. 1006-1015, 2013.
[14]
Li M., Andersen D. G., Park J. W., Smola A. J., Ahmed A., Josifovski V., Long J., Shekita E. J., and Su B. Y., Scaling distributed machine learning with the parameter server, in Proc. 11th USENIX Symposium on Operating Systems Design and Implementation, Broomfield, CO, USA, 2014, pp. 583-598.
[15]
Dean J. and Ghemawat S., MapReduce: Simplified data processing on large clusters, in Proc. 6th Conf. Symposium on Opearting Systems Design & Implementation, San Francisco, CA, USA, 2004, p. 10.
[16]
Isard M., Budiu M., Yu Y., Birrell A., and Fetterly D., Dryad: Distributed data-parallel programs from sequential building blocks, in Proc. 2nd ACM SIGOPS/EuroSys European Conf. Computer Systems 2007, Lisbon, Portugal, 2007, pp. 59-72.
[17]
Zaharia M., Chowdhury M., Das T., Dave A., Ma J., McCauley M., Franklin M. J., Shenker S., and Stoica I., Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing, in Proc. 9th USENIX Conf. Networked Systems Design and Implementation, San Jose, CA, USA, 2012, p. 2.
[18]
Meng X. R., Bradley J., Yavuz B., Sparks E., Venkataraman S., Liu D., Freeman J., Tsai D. B., Amde M., Owen S., et al., MLlib: Machine learning in apache spark, J. Mach. Learn. Res., vol. 17, no. 1, pp. 1235-1241, 2016.
[19]
Qiu Z. L., Wu B., Wang B., Shi C., and Yu L., Gibbs collapsed sampling for latent Dirichlet allocation on spark, in Proc. 3rd Int. Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, New York, NY, USA, 2014, pp. 17-28.
[22]
Griffiths T. L. and Steyvers M., Finding scientific topics, Proc. Natl. Acad. Sci. USA, vol. 101, no. S1, pp. 5228-5235, 2004.
[23]
Li A. Q., Ahmed A., Ravi S., and Smola A. J., Reducing the sampling complexity of topic models, in Proc. 20th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, New York, NY, USA, 2014, pp. 891-900.
[24]
Yu H. F., Hsieh C. J., Yun H., Vishwanathan S. V. N., and Dhillon I. S., A scalable asynchronous distributed algorithm for topic modeling, in Proc. 24th Int. Conf. World Wide Web, Florence, Italy, 2015, pp. 1340-1350.
[25]
Teh Y. W., Newman D., and Welling M., A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation, in Proc. 19th Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2006, pp. 1353-1360.
[26]
Chen J. F., Li K. W., Zhu J., and Chen W., WarpLDA: A cache efficient O(1) algorithm for latent Dirichlet allocation, Proc. VLDB Endowment, vol. 9, no. 10, pp. 744-755, 2016.
[27]
Zaheer M., Wick M., Tristan J. B., Smola A., and Steele Jr G. L., Exponential stochastic cellular automata for massively parallel inference, in Proc. 19th Int. Conf. Artificial Intelligence and Statistics (AISTATS) 2016, Cadiz, Spain, 2016, pp. 966-975.
[28]
Gonzalez J. E., Xin R. S., Dave A., Crankshaw D., Franklin M. J., and Stoica I., GraphX: Graph processing in a distributed dataflow framework, in Proc. 11th USENIX Conf. Operating Systems Design and Implementation, Broomfield, CO, USA, 2014, pp. 599-613.
[29]
Yao L. M., Mimno D., and McCallum A., Efficient methods for topic model inference on streaming document collections, in Proc. 15th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Paris, France, 2009, pp. 937-946.
[30]
Gonzalez J. E., Low Y., Gu H. J., Bickson D., and Guestrin C., PowerGraph: Distributed graph-parallel computation on natural graphs, in Proc. 10th USENIX Conf. Operating Systems Design and Implementation, Hollywood, CA, USA, 2012, pp. 17-30.
[31]
Xie C., Yan L., Li W. J., and Zhang Z. H., Distributed power-law graph computing: theoretical and empirical analysis, in Proc. 27th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2014, pp. 1673-1681.
[35]
Vavilapalli V. K., Murthy A. C., Douglas C., Agarwal S., Konar M., Evans R., Graves T., Lowe J., Shah H., Seth S., et al., Apache Hadoop YARN: Yet another resource negotiator, in Proc. 4th Annual Symposium on Cloud Computing, Santa Clara, CA, USA, 2013.
[36]
Wallach H. M., Mimno D., and McCallum A., Rethinking LDA: Why priors matter, in Proc. 23rd Annu. Conf. on Neural Information Processing Systems, Vancouver, Canada, 2009, pp. 1973-1981.