Journal Home > Volume 2

Combinatorial Optimization (CO) problems have been intensively studied for decades with a wide range of applications. For some classic CO problems, e.g., the Traveling Salesman Problem (TSP), both traditional planning algorithms and the emerging reinforcement learning have made solid progress in recent years. However, for CO problems with nested sub-tasks, neither end-to-end reinforcement learning algorithms nor traditional evolutionary methods can obtain satisfactory strategies within a limited time and computational resources. In this paper, we propose an algorithmic framework for solving CO problems with nested sub-tasks, in which learning and planning algorithms can be combined in a modular way. We validate our framework in the Job-Shop Scheduling Problem (JSSP), and the experimental results show that our algorithm has good performance in both solution qualities and model generalizations.


menu
Abstract
Full text
Outline
About this article

Bridging Reinforcement Learning and Planning to Solve Combinatorial Optimization Problems with Nested Sub-Tasks

Show Author's information Xiaohan Shan1Pengjiu Wang1,2Mingda Wan1Dong Yan1Jialian Li1Jun Zhu1,3( )
Qiyuan Lab, Beijing 100095, China
China Ship Development and Design Center, Wuhan 430064, China
School of Life Sciences, Tsinghua University, Beijing 100084, China

Abstract

Combinatorial Optimization (CO) problems have been intensively studied for decades with a wide range of applications. For some classic CO problems, e.g., the Traveling Salesman Problem (TSP), both traditional planning algorithms and the emerging reinforcement learning have made solid progress in recent years. However, for CO problems with nested sub-tasks, neither end-to-end reinforcement learning algorithms nor traditional evolutionary methods can obtain satisfactory strategies within a limited time and computational resources. In this paper, we propose an algorithmic framework for solving CO problems with nested sub-tasks, in which learning and planning algorithms can be combined in a modular way. We validate our framework in the Job-Shop Scheduling Problem (JSSP), and the experimental results show that our algorithm has good performance in both solution qualities and model generalizations.

Keywords: reinforcement learning, combinatorial optimization, job-shop scheduling problem

References(29)

[1]

K. L. Hoffman, Combinatorial optimization: Current successes and directions for the future, J. Comput. Appl. Math., vol. 124, nos. 1&2, pp. 341–360, 2000.

[2]

Y. Bengio, A. Lodi, and A. Prouvost, Machine learning for combinatorial optimization: A methodological tour d’horizon, Eur. J. Oper. Res., vol. 290, no. 2, pp. 405–421, 2021.

[3]
R. Zhang, A. Prokhorchuk, and J. Dauwels, Deep reinforcement learning for traveling salesman problem with time windows and rejections, in Proc. 2020 Int. Joint Conf. Neural Networks (IJCNN), Glasgow, UK, 2020, pp. 1–8.
DOI
[4]

D. Yan, J. Weng, S. Huang, C. Li, Y. Zhou, H. Su, and J. Zhu, Deep reinforcement learning with credit assignment for combinatorial optimization, Pattern Recognit., vol. 124, pp. 108466, 2022.

[5]

G. B. Dantzig and J. H. Ramser, The truck dispatching problem, Manag. Sci., vol. 6, no. 1, pp. 80–91, 1959.

[6]

J. F. Muth and G. L. Thompson, Industrial scheduling, Louvain Econ. Rev., vol. 32, no. 2, pp. 121–122, 1966.

[7]
C. Yu, X. Zheng, H. H. Zhuo, H. Wan, and W. Luo, Reinforcement learning with knowledge representation and reasoning: A brief survey, arXiv preprint arXiv: 2304.12090, 2023.
[8]

A. S. Jain and S. Meeran, Deterministic job-shop scheduling: Past, present and future, Eur. J. Oper. Res., vol. 113, no. 2, pp. 390–434, 1999.

[9]

E. Demirkol, S. Mehta, and R. Uzsoy, Benchmarks for shop scheduling problems, Eur. J. Oper. Res., vol. 109, no. 1, pp. 137–141, 1998.

[10]
T. M. Moerland, J. Broekens, A. Plaat, and C. M. Jonker, A unifying framework for reinforcement learning and planning, arXiv preprint arXiv: 2006.15009, 2020.
[11]
H. Hu, X. Zhang, X. Yan, L. Wang, and Y. Xu, Solving a new 3D bin packing problem with deep reinforcement learning method, arXiv preprint arXiv: 1708.05930, 2017.
[12]
K. Lin, R. Zhao, Z. Xu, and J. Zhou, Efficient large-scale fleet management via multi-agent deep reinforcement learning, in Proc. 24th ACM SIGKDD Int. Conf. Knowledge Discovery & Data Mining, London, UK, 2018, pp. 1774–1783.
DOI
[13]
A. Mirhoseini, A. Goldie, M. Yazgan, J. Jiang, E. Songhori, S. Wang, Y. J. Lee, E. Johnson, O. Pathak, S. Bae, et al., Chip placement with deep reinforcement learning, arXiv preprint arXiv: 2004.10746, 2020.
[14]
J. Pazis and R. Parr, Generalized value functions for large action sets, in Proc. 28th Int. Conf. Machine Learning, Bellevue, WA, USA, 2011, pp. 1185–1192.
[15]
G. Dulac-Arnold, L. Denoyer, P. Preux, and P. Gallinari, Fast reinforcement learning with large action sets using error-correcting output codes for MDP factorization, in Proc. European Conf. Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2012), Bristol, UK, 2012, pp. 180–194.
DOI
[16]

T. G. Dietterich and G. Bakiri, Solving multiclass learning problems via error-correcting output codes, J. Artif. Intell. Res., vol. 2, pp. 263–286, 1995.

[17]

M. Jaderberg, W. M. Czarnecki, I. Dunning, L. Marris, G. Lever, A. G. Castañeda, C. Beattie, N. C. Rabinowitz, A. S. Morcos, A. Ruderman, et al., Human-level performance in 3D multiplayer games with population-based reinforcement learning, Science, vol. 364, no. 6443, pp. 859–865, 2019.

[18]
C. Berner, G. Brockman, B. Chan, V. Cheung, P. Dębiak, C. Dennison, D. Farhi, Q. Fischer, S. Hashme, C. Hesse, et al. , Dota 2 with large scale deep reinforcement learning, arXiv preprint arXiv: 1912.06680, 2019.
[19]
T. Pierrot, V. Macé, J. B. Sevestre, L. Monier, A. Laterre, N. Perrin, K. Beguir, and O. Sigaud, Factored action spaces in deep reinforcement learning, https://openreview.net/forum?id=naSAkn2Xo46, 2021.
[20]
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, Proximal policy optimization algorithms, arXiv preprint arXiv: 1707.06347, 2017.
[21]
T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V. Kumar, H. Zhu, A. Gupta, P. Abbeel, et al. , Soft actor-critic algorithms and applications, arXiv preprint arXiv: 1812.05905, 2018.
[22]

M. R. Garey, D. S. Johnson, and R. Sethi, The complexity of flowshop and jobshop scheduling, Math. Oper. Res., vol. 1, no. 2, pp. 117–129, 1976.

[23]
P. P. A. Tassel, M. Gebser, and K. Schekotihin, A reinforcement learning environment for job-shop scheduling, in Proc. 2021 PRL Workshop – Bridging the Gap Between AI Planning and Reinforcement Learning, virtual, 2021.
[24]
C. Zhang, W. Song, Z. Cao, J. Zhang, P. S. Tan, and C. Xu, Learning to dispatch for job shop scheduling via deep reinforcement learning, arXiv preprint arXiv: 2010.12367, 2020.
[25]

J. Błażewicz, E. Pesch, and M. Sterna, The disjunctive graph machine representation of the job shop scheduling problem, Eur. J. Oper. Res., vol. 127, no. 2, pp. 317–331, 2000.

[26]
K. Xu, Weihua Hu, J. Leskovec, and S. Jegelka, How powerful are graph neural networks? in Proc. 7th Int. Conf. Learning Representations, New Orleans, LA, USA, 2019.
[27]

E. Taillard, Benchmarks for basic scheduling problems, Eur. J. Oper. Res., vol. 64, no. 2, pp. 278–285, 1993.

[28]
M. Gen, Y. Tsujimura, and E. Kubota, Solving job-shop scheduling problems by genetic algorithm, in Proc. IEEE Int. Conf. Systems, Man and Cybernetics, San Antonio, TX, USA, 1994, pp. 1577–1582.
[29]
L. Perron and V. Furnon, OR-Tools, https://developers.google.com/optimization/, 2019.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 12 July 2023
Revised: 01 September 2023
Accepted: 03 November 2023
Published: 31 December 2023
Issue date: December 2023

Copyright

© The author(s) 2023.

Acknowledgements

Acknowledgment

This work was supported by the National Key Research and Development Program of China (No. 2020AAA0106302), National Natural Science Foundation of China (Nos. 62061136001, 92248303, 62106123, and 61972224), Tsinghua Institute for Guo Qiang, and the High Performance Computing Center, Tsinghua University.

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return