Journal Home > Volume 29 , Issue 2

On-demand food delivery (OFD) is gaining more and more popularity in modern society. As a kernel order assignment manner in OFD scenario, order recommendation directly influences the delivery efficiency of the platform and the delivery experience of riders. This paper addresses the dynamism of the order recommendation problem and proposes a reinforcement learning solution method. An actor-critic network based on long short term memory (LSTM) unit is designed to deal with the order-grabbing conflict between different riders. Besides, three rider sequencing rules are accordingly proposed to match different time steps of the LSTM unit with different riders. To test the performance of the proposed method, extensive experiments are conducted based on real data from Meituan delivery platform. The results demonstrate that the proposed reinforcement learning based order recommendation method can significantly increase the number of grabbed orders and reduce the number of order-grabbing conflicts, resulting in better delivery efficiency and experience for the platform and riders.


menu
Abstract
Full text
Outline
About this article

Reinforcement Learning-Based Dynamic Order Recommendation for On-Demand Food Delivery

Show Author's information Xing Wang1Ling Wang1( )Chenxin Dong2Hao Ren3Ke Xing3
Department of Automation, Tsinghua University, Beijing 100080, China
School of Mechanical and Automotive Engineering, Qingdao Hengxing University of Science and Technology, Qingdao 266100, China
Meituan, Beijing 100015, China

Abstract

On-demand food delivery (OFD) is gaining more and more popularity in modern society. As a kernel order assignment manner in OFD scenario, order recommendation directly influences the delivery efficiency of the platform and the delivery experience of riders. This paper addresses the dynamism of the order recommendation problem and proposes a reinforcement learning solution method. An actor-critic network based on long short term memory (LSTM) unit is designed to deal with the order-grabbing conflict between different riders. Besides, three rider sequencing rules are accordingly proposed to match different time steps of the LSTM unit with different riders. To test the performance of the proposed method, extensive experiments are conducted based on real data from Meituan delivery platform. The results demonstrate that the proposed reinforcement learning based order recommendation method can significantly increase the number of grabbed orders and reduce the number of order-grabbing conflicts, resulting in better delivery efficiency and experience for the platform and riders.

Keywords: reinforcement learning, actor-critic network, on-demand food delivery, order recommendation, long short term memory

References(32)

[1]
L. Li, Y. Chai, and Y. Liu, Evolution of e-commerce patterns: Model and economic analysis, (in Chinese), Journal of Tsinghua University (Science and Technology), vol. 52, no. 11, pp. 1524–1529, 2012.
[2]
X. Liu and Y. Li, VRP model and a heuristic algorithm for across-region distribution in the environment of E-commerce, (in Chinese), Journal of Tsinghua University (Science and Technology), vol. 46, pp. 1014–1018, 2006.
[3]
A. Seghezzi, M. Winkenbach, and R. Mangiaracina, On-demand food delivery: A systematic literature review, Int. J. Logist. Manag., .
[4]
Meituan, Homepage of Meituan delivery, https://peisong.meituan.com/about, 2023.
[5]
C. Li, and L. Miao, Planning methods of regional logistics systems and logistics parks, (in Chinese), Journal of Tsinghua University (Science and Technology), vol. 44, no. 3, pp. 398–401, 2004.
[6]
X. Wang, L. Wang, C. Dong, H. Ren, and K. Xing, An online deep reinforcement learning-based order recommendation framework for rider-centered food delivery system, IEEE Trans. Intell. Transp. Syst., vol. 24, no. 5, pp. 5640–5654, 2023.
[7]
E. Jiang, L. Wang, and J. Wang, Decomposition-based multi-objective optimization for energy-aware distributed hybrid flow shop scheduling with multiprocessor tasks, Tsinghua Science and Technology, vol. 26, no. 5, pp. 646–663, 2021.
[8]
D. Reyes, A. Erera, M. Savelsbergh, S. Sahasrabudhe, and R. J. O’Neil, The meal delivery routing problem, https://optimization-online.org/?p=15139, 2018.
[9]
B. Yildiz and M. Savelsbergh, Provably high-quality solutions for the meal delivery routing problem, Transp. Sci., vol. 53, no. 5, pp. 1372–1388, 2019.
[10]
M. W. Ulmer, B. W. Thomas, A. M. Campbell, and N. Woyak, The restaurant meal delivery problem: Dynamic pickup and delivery with deadlines and random ready times, Transp. Sci., vol. 55, no. 1, pp. 75–100, 2021.
[11]
S. Liu, L. He, and Z. J. M. Shen, On-time last-mile delivery: Order assignment with travel-time predictors, Manag. Sci., vol. 67, no. 7, pp. 4095–4119, 2021.
[12]
J. F. Chen, L. Wang, H. Ren, J. Pan, S. Wang, J. Zheng, and X. Wang, An imitation learning-enhanced iterated matching algorithm for on-demand food delivery, IEEE Trans. Intell. Transp. Syst., vol. 23, no. 10, pp. 18603–18619, 2022.
[13]
Z. Steever, M. Karwan, and C. Murray, Dynamic courier routing for a food delivery service, Comput. Oper. Res., vol. 107, pp. 173–188, 2019.
[14]
S. Paul, S. Rathee, J. Matthew, and K. M. Adusumilli, An optimization framework for on-demand meal delivery system, in Proc. 2020 IEEE Int. Conf. Industrial Engineering and Engineering Management (IEEM), Singapore, 2020, pp. 822–826.
[15]
M. Joshi, A. Singh, S. Ranu, A. Bagchi, P. Karia, and P. Kala, Batching and matching for food delivery in dynamic road networks, in Proc. 2021 IEEE 37th Int. Conf. Data Engineering (ICDE), Chania, Greece, 2021, pp. 2099–2104.
[16]
H. Jahanshahi, A. Bozanta, M. Cevik, E. M. Kavuk, A. Tosun, S. B. Sonuc, B. Kosucu, and A. Başar, A deep reinforcement learning approach for the meal delivery problem, Knowl. Based Syst., vol. 243, p. 108489, 2022.
[17]
L. Wang, Z. Pan, and J. Wang, A review of reinforcement learning based intelligent optimization for manufacturing scheduling, Complex System Modeling and Simulation, vol. 1, no. 4, pp. 257–270, 2021.
[18]
G. Shani, D. Heckerman, and R. I. Brafman, An MDP-based recommender system, J. Mach. Lear. Res., vol. 6, no. 43, pp. 1265–1295, 2005.
[19]
N. Taghipour and A. Kardan, A hybrid web recommender system based on Q-learning, in Proc. 2008 ACM Symp. on Applied Computing, Fortaleza, Brazil, 2008, pp. 1164–1168.
[20]
X. Bai, J. Guan, and H. Wang, A model-based reinforcement learning with adversarial training for online recommendation, in Proc. 33rd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2019, pp. 10735–10746.
[21]
X. Xin, A. Karatzoglou, I. Arapakis, and J. M. Jose, Self-supervised reinforcement learning for recommender systems, in Proc. 43rd Int. ACM SIGIR Conf. Research and Development in Information Retrieval, Virtual Event, China, 2020, pp. 931–940.
[22]
X. Chen, C. Huang, L. Yao, X. Wang, W. Liu, and W. Zhang, Knowledge-guided deep reinforcement learning for interactive recommendation, in Proc. 2020 Int. Joint Conf. Neural Networks (IJCNN), Glasgow, UK, 2020, pp. 1–8.
[23]
X. Zhao, L. Zhang, Z. Ding, L. Xia, J. Tang, and D. Yin, Recommendations with negative feedback via pairwise deep reinforcement learning, in Proc. 24th ACM SIGKDD Int. Conf. Knowledge Discovery & Data Mining, London, UK, 2018, pp. 1040–1048.
[24]
Y. Deng, F. Bao, Y. Kong, Z. Ren, and Q. Dai, Deep direct reinforcement learning for financial signal representation and trading, IEEE Trans. Neural Netw. Learn. Syst., vol. 28, no. 3, pp. 653–664, 2017.
[25]
L. Zou, L. Xia, Z. Ding, J. Song, W. Liu, and D. Yin, Reinforcement learning to optimize long-term user engagement in recommender systems, in Proc. 25th ACM SIGKDD Int. Conf. Knowledge Discovery & Data Mining, Anchorage, AK, USA, 2019, pp. 2810–2818.
[26]
X. Wang, L. Wang, S. Wang, J. F. Chen, and C. Wu, An XGBoost-enhanced fast constructive algorithm for food delivery route planning problem, Comput. Ind. Eng., vol. 152, p. 107029, 2021.
[27]
Y. Tang, L. Li, and X. Liu, State-of-the-art development of complex systems and their simulation methods, Complex System Modeling and Simulation, vol. 1, no. 4, pp. 271–290, 2021.
[28]
H. Salehinejad, S. Sankar, J. Barfett, E. Colak, and S. Valaee, Recent advances in recurrent neural networks, arXiv preprint arXiv: 1801.01078, 2017.
[29]
S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
[30]
D. Silver, G. Lever, N. Heess, T Degris, D. Wierstra, and M Riedmiller, Deterministic policy gradient algorithms, in Proc. 31st Int. Conf. Int. Conf. Machine Learning, Beijing, China, 2014, pp. 387–395.
[31]
C. M. Bishop and N. M. Nasrabadi, Pattern Recognition and Machine Learning. New York, NY, USA: Springer, 2006.
[32]
T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, Continuous control with deep reinforcement learning, arXiv preprint arXiv: 1509.02971, 2015.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 19 April 2023
Revised: 05 May 2023
Accepted: 09 May 2023
Published: 22 September 2023
Issue date: April 2024

Copyright

© The author(s) 2024.

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (No. 62273193), Tsinghua University—Meituan Joint Institute for Digital Life, and the Research and Development Project of CRSC Research & Design Institute Group Co., Ltd.

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return