[1]
K. M. He, X. Y. Zhang, S. Q. Ren, and J. Sun, Deep residual learning for image recognition, in Proc. 2016 IEEE Conf. on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 770-778.
[2]
G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. R. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, et al., Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, IEEE Signal Proc. Mag., vol. 29, no. 6, pp. 82-97, 2012.
[3]
S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Comput., vol. 9, no. 8, pp. 1735-1780, 1997.
[4]
A. Kamilaris and F. X. Prenafeta-Boldú, Deep learning in agriculture: A survey, Comput. Electron. Agric., vol. 147, pp. 70-90, 2018.
[5]
G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. W. M. Van Der Laak, B. Van Ginneken, and C. I. Sánchez, A survey on deep learning in medical image analysis, Med. Image Anal., vol. 42, pp. 60-88, 2017.
[6]
Q. C. Zhang, L. T. Yang, Z. K. Chen, and P. Li, A survey on deep learning for big data, Inf. Fusion, vol. 42, pp. 146-157, 2018.
[7]
L. Zhang, S. Wang, and B. Liu, Deep learning for sentiment analysis: A survey, Wiley Interdiscip. Rev., vol. 8, no. 4, p. e1253, 2018.
[8]
J. D. Wang, Y. Q. Chen, S. J. Hao, X. H. Peng, and L. S. Hu, Deep learning for sensor-based activity recognition: A survey, Pattern Recognit. Lett., vol. 119, pp. 3-11, 2019.
[9]
G. Huang, Y. Sun, Z. Liu, D. Sedra, and K. Q. Weinberger, Deep networks with stochastic depth, in Proc. 14th European Conf. on Computer Vision, Amsterdam, Netherlands, 2016, pp. 646-661.
[11]
S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber, Gradient Flow in Recurrent Nets: The Difficulty of Learning Long-Term Dependencies, In A Field Guide to Dynamical Recurrent Networks, J. F. Kolen and S. C. Kremer, eds. Wiley-IEEE Press, .
[12]
G. B. Goh, N. O. Hodas, and A. Vishnu, Deep learning for computational chemistry, J. Comput. Chem., vol. 38, no. 16, pp. 1291-1307, 2017.
[13]
B. Hanin, Which neural net architectures give rise to exploding and vanishing gradients? in Proc. Advances in Neural Information Processing Systems 31, Montréal, Canada, 2018, pp. 582-591.
[14]
J. Schmidhuber, Learning complex, extended sequences using the principle of history compression, Neural Comput., vol. 4, no. 2, pp. 234-242, 1992.
[15]
V. Nair and G. E. Hinton, Rectified linear units improve restricted Boltzmann machines, in Proc. 27th Int. Conf. on Machine Learning, Haifa, Israel, 2010, pp. 807-814.
[16]
X. Glorot, A. Bordes, and Y. Bengio, Deep sparse rectifier neural networks, in Proc. 14th Int. Conf. on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 2011, pp. 315-323.
[17]
S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, arXiv preprint arXiv: 1502.03167, 2015.
[18]
Y. Yang and H. Wang, Multi-view clustering: A survey, Big Data Mining and Analytics, vol. 1, no. 2, pp. 83-107, 2018.
[19]
S. Kumar and M. Singh, A novel clustering technique for efficient clustering of big data in hadoop ecosystem, Big Data Mining and Analytics, vol. 2, no. 4, pp. 240-247, 2019.
[20]
J. Schmidhuber, Deep learning in neural networks: An overview, Neural Netw., vol. 61, pp. 85-117, 2015.
[21]
C. Darken, J. Chang, and J. Moody, Learning rate schedules for faster stochastic gradient search, in Proc. Neural Networks for Signal Processing II Proc. of the 1992 IEEE Workshop, Helsingoer, Denmark, 1992, pp. 3-12.
[22]
J. Duchi, E. Hazan, and Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, J. Mach. Learn. Res., vol. 12, pp. 2121-2159, 2011.
[23]
M. D. Zeiler, Adadelta: An adaptive learning rate method, arXiv preprint arXiv: 1212.5701, 2012.
[24]
A. Graves, Generating sequences with recurrent neural networks, arXiv preprint arXiv: 1308.0850, 2013.
[25]
D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv: 1412.6980, 2014.
[27]
T. Schaul, S. X. Zhang, and Y. LeCun, No more pesky learning rates, in Proc. 30th Int. Conf. on Machine Learning, Atlanta, GA, USA, 2013, pp. 343-351.
[28]
S. L. Smith, P. J. Kindermans, C. Ying, and Q. V. Le, Don’t decay the learning rate, increase the batch size, arXiv preprint arXiv: 1711.00489, 2017.
[29]
A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. M. Lin, A. Desmaison, L. Antiga, and A. Lerer, Automatic differentiation in PyTorch, in Proc. 31st Conf. on Neural Information Processing Systems, Long Beach, CA, USA, 2017.
[30]
H. Liu, J. Li, Y. Q. Zhang, and Y. Pan, An adaptive genetic fuzzy multi-path routing protocol for wireless ad-hoc networks, in Proc. 6th Int. Conf. on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing and 1st ACIS Int. Workshop on Self-Assembling Wireless Network, Towson, MD, USA, 2005, pp. 468-475.