Stock price prediction is a hot topic and traditional prediction methods are usually based on statistical and econometric models. However, these models are difficult to deal with nonstationary time series data. With the rapid development of the internet and the increasing popularity of social media, online news and comments often reflect investors’ emotions and attitudes toward stocks, which contains a lot of important information for predicting stock price. This paper aims to develop a stock price prediction method by taking full advantage of social media data.
This study proposes a new prediction method based on deep learning technology, which integrates traditional stock financial index variables and social media text features as inputs of the prediction model. This study uses Doc2Vec to build long text feature vectors from social media and then reduce the dimensions of the text feature vectors by stacked auto-encoder to balance the dimensions between text feature variables and stock financial index variables. Meanwhile, based on wavelet transform, the time series data of stock price is decomposed to eliminate the random noise caused by stock market fluctuation. Finally, this study uses long short-term memory model to predict the stock price.
The experiment results show that the method performs better than all three benchmark models in all kinds of evaluation indicators and can effectively predict stock price.
In this paper, this study proposes a new stock price prediction model that incorporates traditional financial features and social media text features which are derived from social media based on deep learning technology.
Abramovich, F., Besbeas, P. and Sapatinas, T. (2002), “Empirical Bayes approach to block wavelet function estimation”, Computational Statistics and Data Analysis, Vol. 39 No. 4, pp. 435-451.
Baek, Y. and Kim, H.Y. (2018), “ModAugNet: a new forecasting framework for stock market index value with an overfitting prevention LSTM module and a prediction LSTM module”, Expert Systems with Applications, Vol. 113, pp. 457-480.
Bao, W., Yue, J. and Rao, Y.L. (2017), “A deep learning framework for financial time series using stacked autoencoders and long-short term memory”, Plos One, Vol. 12 No. 7, p. e0180944.
Bollen, J., Mao, H. and Zeng, X. (2011), “Twitter mood predicts the stock market”, Journal of Computational Science, Vol. 2 No. 1, pp. 1-8.
Booth, G.G., Martikainen, T., Sarkar, S.K., Virtanen, I. and Yliolli, P. (1994), “Nonolinear dependence in Finnish stock returns”, European Journal of Operational Research, Vol. 74 No. 2, pp. 273-283.
Breidt, F.J., Crato, N. and de Lima, P. (1998), “The detection and estimation of long memory in stochastic volatility”,Journal of Econometrics, Vol. 83 Nos 1/2, pp. 325-348.
Cervello-Royo, R., Guijarro, F. and Michniuk, K. (2015), “Stock market trading rule based on pattern recognition and technical analysis: forecasting the DJIA index with intraday data”, Expert Systems with Applications, Vol. 42 No. 14, pp. 5963-5975.
Chen, Y., Lin, Z., Zhao, X., Wang, G. and Gu, Y. (2014), “Deep learning-based classification of hyperspectral data”, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, Vol. 7 No. 6, pp. 2094-2107.
Delong, J.B., Shleifer, A., Summers, L.H. and Waldmann, R.J. (1990), “Noise trader risk in financial-markets”, Journal of Political Economy, Vol. 98 No. 4, pp. 703-738.
Engle, R. (2001), “GARCH 101: the use of ARCH/GARCH models in applied econometrics”, Journal of Economic Perspectives, Vol. 15 No. 4, pp. 157-168.
Hagenau, M., Liebmann, M. and Neumann, D. (2013), “Automated news reading: stock price prediction based on financial news using context-capturing features”, Decision Support Systems, Vol. 55 No. 3, pp. 685-697.
Huang, C.J., Liao, J.J., Yang, D.X., Chang, T.Y. and Luo, Y.C. (2010), “Realization of a news dissemination agent based on weighted association rules and text mining techniques”, Expert Systems with Applications, Vol. 37 No. 9, pp. 6409-6413.
Kim, T. and Kim, H.Y. (2019), “Forecasting stock prices with a feature fusion LSTM-CNN model using different representations of the same data”, Plos One, Vol. 14 No. 2, p. e0212320.
Kim, H.K., Kim, H. and Cho, S. (2017), “Bag-of-concepts: comprehending document representation through clustering words in distributed representation”, Neurocomputing, Vol. 266, pp. 336-352.
Kim, D., Seo, D., Cho, S. and Kang, P. (2019), “Multi-co-training for document classification using various document representations: TF-IDF, LDA, and Doc2Vec”, Information Sciences, Vol. 477, pp. 15-29.
Kraus, M. and Feuerriegel, S. (2017), “Decision support from financial disclosures with deep neural networks and transfer learning”, Decision Support Systems, Vol. 104, pp. 38-48.
Maknickas, A. and Maknickiene, N. (2019), “Support system for trading in exchange market by distributional forecasting model”, Informatica, Vol. 30 No. 1, pp. 73-90.
Marmer, V. (2008), “Nonlinearity, nonstationarity, and spurious forecasts”, Journal of Econometrics, Vol. 142 No. 1, pp. 1-27.
M'ng, J.C.P. and Mehralizadeh, M. (2016), “Forecasting east Asian indices futures via a novel hybrid of Wavelet-PCA denoising and artificial neural network models”, Plos One, Vol. 11, p. e0156338.
Nassirtoussi, A.K., Aghabozorgi, S., Teh, Y.W. and Ngo, D.C.L. (2014), “Text mining for market prediction: a systematic review”, Expert Systems with Applications, Vol. 41 No. 16, pp. 7653-7670.
Papagiannaki, K., Taft, N., Zhang, Z.L. and Diot, C. (2005), “Long-term forecasting of internet backbone traffic”, IEEE Transactions on Neural Networks, Vol. 16 No. 5, pp. 1110-1124.
Patel, J., Shah, S., Thakkar, P. and Kotecha, K. (2015), “Predicting stock and stock price index movement using trend deterministic data preparation and machine learning techniques”, Expert Systems with Applications, Vol. 42 No. 1, pp. 259-268.
Peng, Y., Liu, Y. and Zhang, R. (2019), “Modeling and analysis of stock price forecast based on LSTM”, Computer Engineering and Application, Vol. 55, pp. 209-212. (in Chinese).
Ramsey, J.B. (1999), “The contribution of wavelets to the analysis of economic and financial data”, Philosophical Transactions of the Royal Society a-Mathematical Physical and Engineering Sciences, Vol. 357 No. 1760, pp. 2593-2606.
Refenes, A.N., Zapranis, A. and Francis, G. (1994), “Stock performance modeling using neural networks – a comparative-study with regression-models”, Neural Networks, Vol. 7 No. 2, pp. 375-388.
Schölkopf, B., Platt, J. and Hofmann, T. (2007), “Greedy layer-wise training of deep networks”, Advances in Neural Information Processing Systems, Vol. 19, pp. 153-160.
Schumaker, R.P. and Chen, H. (2009), “Textual analysis of stock market prediction using breaking financial news: the AZFinText system”, ACM Transactions on Information Systems, Vol. 27 No. 2.
Shleifer, A. and Vishny, R.W. (1997), “The limits of arbitrage”, The Journal of Finance, Vol. 52 No. 1, pp. 35-55.
Singh, R. and Srivastava, S. (2017), “Stock prediction using deep learning”, Multimedia Tools and Applications, Vol. 76 No. 18, pp. 18569-18584.
Vo, N.N.Y., He, X., Liu, S. and Xu, G. (2019), “Deep learning for decision making and the optimization of socially responsible investments and portfolio”, Decision Support Systems, Vol. 124, UNSP 113097.
Wang, Y., Yao, H. and Zhao, S. (2016), “Auto-encoder based dimensionality reduction”, Neurocomputing, Vol. 184, pp. 232-242.
Xie, X., Lei, X. and Zhao, Y. (2020), “Application of mutual information and improved PCA dimensionality reduction algorithm in stock price forecasting”, Computer Engineering and Applications, in Chinese.
Zhang, G.S. and Zhang, X.D. (2016), “A Differential-Information based ARMAD-GARCH stock price forecasting model”, Systems Engineering – Theory and Practice, Vol. 36, pp. 1136-1145 (in Chinese).
Zhang, Q., Yang, L.T., Chen, Z. and Li, P. (2018), “A survey on deep learning for big data”, Information Fusion, Vol. 42, pp. 146-157.
Zhou, Z., Ke, X. and Jichang, Z. (2018), “Tales of emotion and stock in China: volatility, causality and prediction”, World Wide Web-Internet and Web Information Systems, Vol. 21, pp. 1093-1116.
Funding: This work was supported by National Key Research and Development Plan of China (Grant No: 2017YFB1400101), National Natural Science Foundation of China (Grant No: 71572013, 71872013, 72072011) and Beijing Municipal Social Science Foundation (Grant No: 18JDGLB040).
Xuan Ji, Jiachen Wang and Zhijun Yan. Published in International Journal of Crowd Science. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at http://creativecommons.org/licences/by/4.0/legalcode