CAAI Artificial Intelligence Research 2023, 2: 9150025 https://doi.org/10.26599/AIR.2023.9150025

Article |

Open Access | Issue | Published: 31 December 2023

Bridging Reinforcement Learning and Planning to Solve Combinatorial Optimization Problems with Nested Sub-Tasks

Show Author's Information Hide Author's Information Xiaohan Shan^¹, Pengjiu Wang^{¹^,²}, Mingda Wan^¹, Dong Yan^¹, Jialian Li^¹, Jun Zhu^{¹^,³}(

)

1Qiyuan Lab, Beijing 100095, China

2China Ship Development and Design Center, Wuhan 430064, China

3School of Life Sciences, Tsinghua University, Beijing 100084, China

Keywords:

reinforcement learning, combinatorial optimization, job-shop scheduling problem

Cite this article:

Shan X, Wang P, Wan M, et al. Bridging Reinforcement Learning and Planning to Solve Combinatorial Optimization Problems with Nested Sub-Tasks. CAAI Artificial Intelligence Research, 2023, 2: 9150025. https://doi.org/10.26599/AIR.2023.9150025

Download citation

EndNote(RIS)

BibTeX

316

Views

Downloads

Citations

Crossref

N/A

WoS

N/A

Scopus

N/A

CSCD

Abstract Full text About this article

Abstract

Combinatorial Optimization (CO) problems have been intensively studied for decades with a wide range of applications. For some classic CO problems, e.g., the Traveling Salesman Problem (TSP), both traditional planning algorithms and the emerging reinforcement learning have made solid progress in recent years. However, for CO problems with nested sub-tasks, neither end-to-end reinforcement learning algorithms nor traditional evolutionary methods can obtain satisfactory strategies within a limited time and computational resources. In this paper, we propose an algorithmic framework for solving CO problems with nested sub-tasks, in which learning and planning algorithms can be combined in a modular way. We validate our framework in the Job-Shop Scheduling Problem (JSSP), and the experimental results show that our algorithm has good performance in both solution qualities and model generalizations.

Full text

Abstract

Full text

Outline

About this article

Bridging Reinforcement Learning and Planning to Solve Combinatorial Optimization Problems with Nested Sub-Tasks

Show Author's information Hide Author's Information Xiaohan Shan^¹, Pengjiu Wang^{¹^,²}, Mingda Wan^¹, Dong Yan^¹, Jialian Li^¹, Jun Zhu^{¹^,³}(

)

1Qiyuan Lab, Beijing 100095, China

2China Ship Development and Design Center, Wuhan 430064, China

3School of Life Sciences, Tsinghua University, Beijing 100084, China

Abstract

Keywords: reinforcement learning, combinatorial optimization, job-shop scheduling problem

References(29)

[1]

K. L. Hoffman, Combinatorial optimization: Current successes and directions for the future, J. Comput. Appl. Math., vol. 124, nos. 1&2, pp. 341–360, 2000.

DOI Google Scholar

[2]

Y. Bengio, A. Lodi, and A. Prouvost, Machine learning for combinatorial optimization: A methodological tour d’horizon, Eur. J. Oper. Res., vol. 290, no. 2, pp. 405–421, 2021.

DOI Google Scholar

[3]

R. Zhang, A. Prokhorchuk, and J. Dauwels, Deep reinforcement learning for traveling salesman problem with time windows and rejections, in Proc. 2020 Int. Joint Conf. Neural Networks (IJCNN), Glasgow, UK, 2020, pp. 1–8.

DOI

[4]

D. Yan, J. Weng, S. Huang, C. Li, Y. Zhou, H. Su, and J. Zhu, Deep reinforcement learning with credit assignment for combinatorial optimization, Pattern Recognit., vol. 124, pp. 108466, 2022.

DOI Google Scholar

[5]

G. B. Dantzig and J. H. Ramser, The truck dispatching problem, Manag. Sci., vol. 6, no. 1, pp. 80–91, 1959.

DOI Google Scholar

[6]

J. F. Muth and G. L. Thompson, Industrial scheduling, Louvain Econ. Rev., vol. 32, no. 2, pp. 121–122, 1966.

DOI Google Scholar

[7]

C. Yu, X. Zheng, H. H. Zhuo, H. Wan, and W. Luo, Reinforcement learning with knowledge representation and reasoning: A brief survey, arXiv preprint arXiv: 2304.12090, 2023.

[8]

A. S. Jain and S. Meeran, Deterministic job-shop scheduling: Past, present and future, Eur. J. Oper. Res., vol. 113, no. 2, pp. 390–434, 1999.

DOI Google Scholar

[9]

E. Demirkol, S. Mehta, and R. Uzsoy, Benchmarks for shop scheduling problems, Eur. J. Oper. Res., vol. 109, no. 1, pp. 137–141, 1998.

DOI Google Scholar

[10]

T. M. Moerland, J. Broekens, A. Plaat, and C. M. Jonker, A unifying framework for reinforcement learning and planning, arXiv preprint arXiv: 2006.15009, 2020.

[11]

H. Hu, X. Zhang, X. Yan, L. Wang, and Y. Xu, Solving a new 3D bin packing problem with deep reinforcement learning method, arXiv preprint arXiv: 1708.05930, 2017.

[12]

K. Lin, R. Zhao, Z. Xu, and J. Zhou, Efficient large-scale fleet management via multi-agent deep reinforcement learning, in Proc. 24th ACM SIGKDD Int. Conf. Knowledge Discovery & Data Mining, London, UK, 2018, pp. 1774–1783.

DOI

[13]

A. Mirhoseini, A. Goldie, M. Yazgan, J. Jiang, E. Songhori, S. Wang, Y. J. Lee, E. Johnson, O. Pathak, S. Bae, et al., Chip placement with deep reinforcement learning, arXiv preprint arXiv: 2004.10746, 2020.

[14]

J. Pazis and R. Parr, Generalized value functions for large action sets, in Proc. 28th Int. Conf. Machine Learning, Bellevue, WA, USA, 2011, pp. 1185–1192.

[15]

G. Dulac-Arnold, L. Denoyer, P. Preux, and P. Gallinari, Fast reinforcement learning with large action sets using error-correcting output codes for MDP factorization, in Proc. European Conf. Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2012), Bristol, UK, 2012, pp. 180–194.

DOI

[16]

T. G. Dietterich and G. Bakiri, Solving multiclass learning problems via error-correcting output codes, J. Artif. Intell. Res., vol. 2, pp. 263–286, 1995.

DOI Google Scholar

[17]

M. Jaderberg, W. M. Czarnecki, I. Dunning, L. Marris, G. Lever, A. G. Castañeda, C. Beattie, N. C. Rabinowitz, A. S. Morcos, A. Ruderman, et al., Human-level performance in 3D multiplayer games with population-based reinforcement learning, Science, vol. 364, no. 6443, pp. 859–865, 2019.

DOI Google Scholar

[18]

C. Berner, G. Brockman, B. Chan, V. Cheung, P. Dębiak, C. Dennison, D. Farhi, Q. Fischer, S. Hashme, C. Hesse, et al. , Dota 2 with large scale deep reinforcement learning, arXiv preprint arXiv: 1912.06680, 2019.

[19]

T. Pierrot, V. Macé, J. B. Sevestre, L. Monier, A. Laterre, N. Perrin, K. Beguir, and O. Sigaud, Factored action spaces in deep reinforcement learning, https://openreview.net/forum?id=naSAkn2Xo46, 2021.

[20]

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, Proximal policy optimization algorithms, arXiv preprint arXiv: 1707.06347, 2017.

[21]

T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V. Kumar, H. Zhu, A. Gupta, P. Abbeel, et al. , Soft actor-critic algorithms and applications, arXiv preprint arXiv: 1812.05905, 2018.

[22]

M. R. Garey, D. S. Johnson, and R. Sethi, The complexity of flowshop and jobshop scheduling, Math. Oper. Res., vol. 1, no. 2, pp. 117–129, 1976.

DOI Google Scholar

[23]

P. P. A. Tassel, M. Gebser, and K. Schekotihin, A reinforcement learning environment for job-shop scheduling, in Proc. 2021 PRL Workshop – Bridging the Gap Between AI Planning and Reinforcement Learning, virtual, 2021.

[24]

C. Zhang, W. Song, Z. Cao, J. Zhang, P. S. Tan, and C. Xu, Learning to dispatch for job shop scheduling via deep reinforcement learning, arXiv preprint arXiv: 2010.12367, 2020.

[25]

J. Błażewicz, E. Pesch, and M. Sterna, The disjunctive graph machine representation of the job shop scheduling problem, Eur. J. Oper. Res., vol. 127, no. 2, pp. 317–331, 2000.

DOI Google Scholar

[26]

K. Xu, Weihua Hu, J. Leskovec, and S. Jegelka, How powerful are graph neural networks? in Proc. 7th Int. Conf. Learning Representations, New Orleans, LA, USA, 2019.

[27]

E. Taillard, Benchmarks for basic scheduling problems, Eur. J. Oper. Res., vol. 64, no. 2, pp. 278–285, 1993.

DOI Google Scholar

[28]

M. Gen, Y. Tsujimura, and E. Kubota, Solving job-shop scheduling problems by genetic algorithm, in Proc. IEEE Int. Conf. Systems, Man and Cybernetics, San Antonio, TX, USA, 1994, pp. 1577–1582.

[29]

L. Perron and V. Furnon, OR-Tools, https://developers.google.com/optimization/, 2019.

About this article

Publication history

Acknowledgements

Rights and permissions

Publication history

Received: 12 July 2023

Revised: 01 September 2023

Accepted: 03 November 2023

Published: 31 December 2023

Issue date: December 2023

Copyright

Acknowledgements

Acknowledgment

This work was supported by the National Key Research and Development Program of China (No. 2020AAA0106302), National Natural Science Foundation of China (Nos. 62061136001, 92248303, 62106123, and 61972224), Tsinghua Institute for Guo Qiang, and the High Performance Computing Center, Tsinghua University.

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).