Reinforcement Learning-Based Dynamic Order Recommendation for On-Demand Food Delivery

Xing Wang; Ling Wang; Chenxin Dong; Hao Ren; Ke Xing

doi:10.26599/TST.2023.9010041

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Journals A - Z

About Us

Publish with Us

Support

PDF (10.4 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Open Access

Reinforcement Learning-Based Dynamic Order Recommendation for On-Demand Food Delivery

Xing Wang^¹, Ling Wang^¹(

), Chenxin Dong^², Hao Ren^³, Ke Xing^³

1Department of Automation, Tsinghua University, Beijing 100080, China

2School of Mechanical and Automotive Engineering, Qingdao Hengxing University of Science and Technology, Qingdao 266100, China

3Meituan, Beijing 100015, China

Show Author Information

Abstract

On-demand food delivery (OFD) is gaining more and more popularity in modern society. As a kernel order assignment manner in OFD scenario, order recommendation directly influences the delivery efficiency of the platform and the delivery experience of riders. This paper addresses the dynamism of the order recommendation problem and proposes a reinforcement learning solution method. An actor-critic network based on long short term memory (LSTM) unit is designed to deal with the order-grabbing conflict between different riders. Besides, three rider sequencing rules are accordingly proposed to match different time steps of the LSTM unit with different riders. To test the performance of the proposed method, extensive experiments are conducted based on real data from Meituan delivery platform. The results demonstrate that the proposed reinforcement learning based order recommendation method can significantly increase the number of grabbed orders and reduce the number of order-grabbing conflicts, resulting in better delivery efficiency and experience for the platform and riders.

Keywords

reinforcement learning actor-critic network on-demand food delivery order recommendation long short term memory

References

[1]

L. Li, Y. Chai, and Y. Liu, Evolution of e-commerce patterns: Model and economic analysis, (in Chinese), Journal of Tsinghua University (Science and Technology), vol. 52, no. 11, pp. 1524–1529, 2012.

Google Scholar

[2]

X. Liu and Y. Li, VRP model and a heuristic algorithm for across-region distribution in the environment of E-commerce, (in Chinese), Journal of Tsinghua University (Science and Technology), vol. 46, pp. 1014–1018, 2006.

Google Scholar

[3]

A. Seghezzi, M. Winkenbach, and R. Mangiaracina, On-demand food delivery: A systematic literature review, Int. J. Logist. Manag., .

Crossref Google Scholar

[4]

Meituan, Homepage of Meituan delivery, https://peisong.meituan.com/about, 2023.

[5]

C. Li, and L. Miao, Planning methods of regional logistics systems and logistics parks, (in Chinese), Journal of Tsinghua University (Science and Technology), vol. 44, no. 3, pp. 398–401, 2004.

Google Scholar

[6]

X. Wang, L. Wang, C. Dong, H. Ren, and K. Xing, An online deep reinforcement learning-based order recommendation framework for rider-centered food delivery system, IEEE Trans. Intell. Transp. Syst., vol. 24, no. 5, pp. 5640–5654, 2023.

Crossref Google Scholar

[7]

E. Jiang, L. Wang, and J. Wang, Decomposition-based multi-objective optimization for energy-aware distributed hybrid flow shop scheduling with multiprocessor tasks, Tsinghua Science and Technology, vol. 26, no. 5, pp. 646–663, 2021.

Crossref Google Scholar

[8]

D. Reyes, A. Erera, M. Savelsbergh, S. Sahasrabudhe, and R. J. O’Neil, The meal delivery routing problem, https://optimization-online.org/?p=15139, 2018.

[9]

B. Yildiz and M. Savelsbergh, Provably high-quality solutions for the meal delivery routing problem, Transp. Sci., vol. 53, no. 5, pp. 1372–1388, 2019.

Crossref Google Scholar

[10]

M. W. Ulmer, B. W. Thomas, A. M. Campbell, and N. Woyak, The restaurant meal delivery problem: Dynamic pickup and delivery with deadlines and random ready times, Transp. Sci., vol. 55, no. 1, pp. 75–100, 2021.

Crossref Google Scholar

[11]

S. Liu, L. He, and Z. J. M. Shen, On-time last-mile delivery: Order assignment with travel-time predictors, Manag. Sci., vol. 67, no. 7, pp. 4095–4119, 2021.

Crossref Google Scholar

[12]

J. F. Chen, L. Wang, H. Ren, J. Pan, S. Wang, J. Zheng, and X. Wang, An imitation learning-enhanced iterated matching algorithm for on-demand food delivery, IEEE Trans. Intell. Transp. Syst., vol. 23, no. 10, pp. 18603–18619, 2022.

Crossref Google Scholar

[13]

Z. Steever, M. Karwan, and C. Murray, Dynamic courier routing for a food delivery service, Comput. Oper. Res., vol. 107, pp. 173–188, 2019.

Crossref Google Scholar

[14]

S. Paul, S. Rathee, J. Matthew, and K. M. Adusumilli, An optimization framework for on-demand meal delivery system, in Proc. 2020 IEEE Int. Conf. Industrial Engineering and Engineering Management (IEEM), Singapore, 2020, pp. 822–826.

Crossref Google Scholar

[15]

M. Joshi, A. Singh, S. Ranu, A. Bagchi, P. Karia, and P. Kala, Batching and matching for food delivery in dynamic road networks, in Proc. 2021 IEEE 37th Int. Conf. Data Engineering (ICDE), Chania, Greece, 2021, pp. 2099–2104.

Crossref Google Scholar

[16]

H. Jahanshahi, A. Bozanta, M. Cevik, E. M. Kavuk, A. Tosun, S. B. Sonuc, B. Kosucu, and A. Başar, A deep reinforcement learning approach for the meal delivery problem, Knowl. Based Syst., vol. 243, p. 108489, 2022.

Crossref Google Scholar

[17]

L. Wang, Z. Pan, and J. Wang, A review of reinforcement learning based intelligent optimization for manufacturing scheduling, Complex System Modeling and Simulation, vol. 1, no. 4, pp. 257–270, 2021.

Crossref Google Scholar

[18]

G. Shani, D. Heckerman, and R. I. Brafman, An MDP-based recommender system, J. Mach. Lear. Res., vol. 6, no. 43, pp. 1265–1295, 2005.

Google Scholar

[19]

N. Taghipour and A. Kardan, A hybrid web recommender system based on Q-learning, in Proc. 2008 ACM Symp. on Applied Computing, Fortaleza, Brazil, 2008, pp. 1164–1168.

Crossref Google Scholar

[20]

X. Bai, J. Guan, and H. Wang, A model-based reinforcement learning with adversarial training for online recommendation, in Proc. 33rd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2019, pp. 10735–10746.

Google Scholar

[21]

X. Xin, A. Karatzoglou, I. Arapakis, and J. M. Jose, Self-supervised reinforcement learning for recommender systems, in Proc. 43rd Int. ACM SIGIR Conf. Research and Development in Information Retrieval, Virtual Event, China, 2020, pp. 931–940.

Crossref Google Scholar

[22]

X. Chen, C. Huang, L. Yao, X. Wang, W. Liu, and W. Zhang, Knowledge-guided deep reinforcement learning for interactive recommendation, in Proc. 2020 Int. Joint Conf. Neural Networks (IJCNN), Glasgow, UK, 2020, pp. 1–8.

Crossref Google Scholar

[23]

X. Zhao, L. Zhang, Z. Ding, L. Xia, J. Tang, and D. Yin, Recommendations with negative feedback via pairwise deep reinforcement learning, in Proc. 24th ACM SIGKDD Int. Conf. Knowledge Discovery & Data Mining, London, UK, 2018, pp. 1040–1048.

Crossref Google Scholar

[24]

Y. Deng, F. Bao, Y. Kong, Z. Ren, and Q. Dai, Deep direct reinforcement learning for financial signal representation and trading, IEEE Trans. Neural Netw. Learn. Syst., vol. 28, no. 3, pp. 653–664, 2017.

Crossref Google Scholar

[25]

L. Zou, L. Xia, Z. Ding, J. Song, W. Liu, and D. Yin, Reinforcement learning to optimize long-term user engagement in recommender systems, in Proc. 25th ACM SIGKDD Int. Conf. Knowledge Discovery & Data Mining, Anchorage, AK, USA, 2019, pp. 2810–2818.

Crossref Google Scholar

[26]

X. Wang, L. Wang, S. Wang, J. F. Chen, and C. Wu, An XGBoost-enhanced fast constructive algorithm for food delivery route planning problem, Comput. Ind. Eng., vol. 152, p. 107029, 2021.

Crossref Google Scholar

[27]

Y. Tang, L. Li, and X. Liu, State-of-the-art development of complex systems and their simulation methods, Complex System Modeling and Simulation, vol. 1, no. 4, pp. 271–290, 2021.

Crossref Google Scholar

[28]

H. Salehinejad, S. Sankar, J. Barfett, E. Colak, and S. Valaee, Recent advances in recurrent neural networks, arXiv preprint arXiv: 1801.01078, 2017.

Google Scholar

[29]

S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997.

Crossref Google Scholar

[30]

D. Silver, G. Lever, N. Heess, T Degris, D. Wierstra, and M Riedmiller, Deterministic policy gradient algorithms, in Proc. 31st Int. Conf. Int. Conf. Machine Learning, Beijing, China, 2014, pp. 387–395.

Google Scholar

[31]

C. M. Bishop and N. M. Nasrabadi, Pattern Recognition and Machine Learning. New York, NY, USA: Springer, 2006.

[32]

T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, Continuous control with deep reinforcement learning, arXiv preprint arXiv: 1509.02971, 2015.

Google Scholar

Tsinghua Science and Technology

Volume 29 Issue 2,
April 2024

Pages 356-367

DOI: 10.26599/TST.2023.9010041

Cite this article:

Wang X, Wang L, Dong C, et al. Reinforcement Learning-Based Dynamic Order Recommendation for On-Demand Food Delivery. Tsinghua Science and Technology, 2024, 29(2): 356-367. https://doi.org/10.26599/TST.2023.9010041

433

Views

Downloads

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Altmetrics

Received: 19 April 2023

Revised: 05 May 2023

Accepted: 09 May 2023

Published: 22 September 2023

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).