L. Li, Y. Chai, and Y. Liu, Evolution of e-commerce patterns: Model and economic analysis, (in Chinese), Journal of Tsinghua University (Science and Technology), vol. 52, no. 11, pp. 1524–1529, 2012.
X. Liu and Y. Li, VRP model and a heuristic algorithm for across-region distribution in the environment of E-commerce, (in Chinese), Journal of Tsinghua University (Science and Technology), vol. 46, pp. 1014–1018, 2006.
A. Seghezzi, M. Winkenbach, and R. Mangiaracina, On-demand food delivery: A systematic literature review, Int. J. Logist. Manag., .
C. Li, and L. Miao, Planning methods of regional logistics systems and logistics parks, (in Chinese), Journal of Tsinghua University (Science and Technology), vol. 44, no. 3, pp. 398–401, 2004.
X. Wang, L. Wang, C. Dong, H. Ren, and K. Xing, An online deep reinforcement learning-based order recommendation framework for rider-centered food delivery system, IEEE Trans. Intell. Transp. Syst., vol. 24, no. 5, pp. 5640–5654, 2023.
E. Jiang, L. Wang, and J. Wang, Decomposition-based multi-objective optimization for energy-aware distributed hybrid flow shop scheduling with multiprocessor tasks, Tsinghua Science and Technology, vol. 26, no. 5, pp. 646–663, 2021.
B. Yildiz and M. Savelsbergh, Provably high-quality solutions for the meal delivery routing problem, Transp. Sci., vol. 53, no. 5, pp. 1372–1388, 2019.
M. W. Ulmer, B. W. Thomas, A. M. Campbell, and N. Woyak, The restaurant meal delivery problem: Dynamic pickup and delivery with deadlines and random ready times, Transp. Sci., vol. 55, no. 1, pp. 75–100, 2021.
S. Liu, L. He, and Z. J. M. Shen, On-time last-mile delivery: Order assignment with travel-time predictors, Manag. Sci., vol. 67, no. 7, pp. 4095–4119, 2021.
J. F. Chen, L. Wang, H. Ren, J. Pan, S. Wang, J. Zheng, and X. Wang, An imitation learning-enhanced iterated matching algorithm for on-demand food delivery, IEEE Trans. Intell. Transp. Syst., vol. 23, no. 10, pp. 18603–18619, 2022.
Z. Steever, M. Karwan, and C. Murray, Dynamic courier routing for a food delivery service, Comput. Oper. Res., vol. 107, pp. 173–188, 2019.
S. Paul, S. Rathee, J. Matthew, and K. M. Adusumilli, An optimization framework for on-demand meal delivery system, in Proc. 2020 IEEE Int. Conf. Industrial Engineering and Engineering Management (IEEM), Singapore, 2020, pp. 822–826.
M. Joshi, A. Singh, S. Ranu, A. Bagchi, P. Karia, and P. Kala, Batching and matching for food delivery in dynamic road networks, in Proc. 2021 IEEE 37th Int. Conf. Data Engineering (ICDE), Chania, Greece, 2021, pp. 2099–2104.
H. Jahanshahi, A. Bozanta, M. Cevik, E. M. Kavuk, A. Tosun, S. B. Sonuc, B. Kosucu, and A. Başar, A deep reinforcement learning approach for the meal delivery problem, Knowl. Based Syst., vol. 243, p. 108489, 2022.
L. Wang, Z. Pan, and J. Wang, A review of reinforcement learning based intelligent optimization for manufacturing scheduling, Complex System Modeling and Simulation, vol. 1, no. 4, pp. 257–270, 2021.
G. Shani, D. Heckerman, and R. I. Brafman, An MDP-based recommender system, J. Mach. Lear. Res., vol. 6, no. 43, pp. 1265–1295, 2005.
N. Taghipour and A. Kardan, A hybrid web recommender system based on Q-learning, in Proc. 2008 ACM Symp. on Applied Computing, Fortaleza, Brazil, 2008, pp. 1164–1168.
X. Bai, J. Guan, and H. Wang, A model-based reinforcement learning with adversarial training for online recommendation, in Proc. 33rd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2019, pp. 10735–10746.
X. Xin, A. Karatzoglou, I. Arapakis, and J. M. Jose, Self-supervised reinforcement learning for recommender systems, in Proc. 43rd Int. ACM SIGIR Conf. Research and Development in Information Retrieval, Virtual Event, China, 2020, pp. 931–940.
X. Chen, C. Huang, L. Yao, X. Wang, W. Liu, and W. Zhang, Knowledge-guided deep reinforcement learning for interactive recommendation, in Proc. 2020 Int. Joint Conf. Neural Networks (IJCNN), Glasgow, UK, 2020, pp. 1–8.
X. Zhao, L. Zhang, Z. Ding, L. Xia, J. Tang, and D. Yin, Recommendations with negative feedback via pairwise deep reinforcement learning, in Proc. 24th ACM SIGKDD Int. Conf. Knowledge Discovery & Data Mining, London, UK, 2018, pp. 1040–1048.
Y. Deng, F. Bao, Y. Kong, Z. Ren, and Q. Dai, Deep direct reinforcement learning for financial signal representation and trading, IEEE Trans. Neural Netw. Learn. Syst., vol. 28, no. 3, pp. 653–664, 2017.
L. Zou, L. Xia, Z. Ding, J. Song, W. Liu, and D. Yin, Reinforcement learning to optimize long-term user engagement in recommender systems, in Proc. 25th ACM SIGKDD Int. Conf. Knowledge Discovery & Data Mining, Anchorage, AK, USA, 2019, pp. 2810–2818.
X. Wang, L. Wang, S. Wang, J. F. Chen, and C. Wu, An XGBoost-enhanced fast constructive algorithm for food delivery route planning problem, Comput. Ind. Eng., vol. 152, p. 107029, 2021.
Y. Tang, L. Li, and X. Liu, State-of-the-art development of complex systems and their simulation methods, Complex System Modeling and Simulation, vol. 1, no. 4, pp. 271–290, 2021.
H. Salehinejad, S. Sankar, J. Barfett, E. Colak, and S. Valaee, Recent advances in recurrent neural networks, arXiv preprint arXiv: 1801.01078, 2017.
S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
D. Silver, G. Lever, N. Heess, T Degris, D. Wierstra, and M Riedmiller, Deterministic policy gradient algorithms, in Proc. 31st Int. Conf. Int. Conf. Machine Learning, Beijing, China, 2014, pp. 387–395.
C. M. Bishop and N. M. Nasrabadi, Pattern Recognition and Machine Learning. New York, NY, USA: Springer, 2006.
T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, Continuous control with deep reinforcement learning, arXiv preprint arXiv: 1509.02971, 2015.