Learning control policies for stochastic systems with reach-avoid guarantees

Zikelic, Dorde; Lechner, Mathias; Henzinger, Thomas A; Chatterjee, Krishnendu

Learning control policies for stochastic systems with reach-avoid guarantees

Zikelic D, Lechner M, Henzinger TA, Chatterjee K. 2023. Learning control policies for stochastic systems with reach-avoid guarantees. Proceedings of the 37th AAAI Conference on Artificial Intelligence. AAAI: Conference on Artificial Intelligence vol. 37, 11926–11935.

Download (ext.)

https://arxiv.org/abs/2210.05308 [Preprint]

DOI

10.1609/aaai.v37i10.26407

Conference Paper | Published | English

Scopus indexed

Author

Zikelic, Djordje^ISTA ; Lechner, Mathias^ISTA; Henzinger, Thomas A^ISTA ; Chatterjee, Krishnendu^ISTA

Corresponding author has ISTA affiliation

Department

Henzinger_Thomas Group
Chatterjee Group

Grant

Vigilant Algorithmic Monitoring of Software
Formal Methods for Stochastic Models: Algorithms and Applications
International IST Doctoral Program

Abstract

We study the problem of learning controllers for discrete-time non-linear stochastic dynamical systems with formal reach-avoid guarantees. This work presents the first method for providing formal reach-avoid guarantees, which combine and generalize stability and safety guarantees, with a tolerable probability threshold p in [0,1] over the infinite time horizon. Our method leverages advances in machine learning literature and it represents formal certificates as neural networks. In particular, we learn a certificate in the form of a reach-avoid supermartingale (RASM), a novel notion that we introduce in this work. Our RASMs provide reachability and avoidance guarantees by imposing constraints on what can be viewed as a stochastic extension of level sets of Lyapunov functions for deterministic systems. Our approach solves several important problems -- it can be used to learn a control policy from scratch, to verify a reach-avoid specification for a fixed control policy, or to fine-tune a pre-trained policy if it does not satisfy the reach-avoid specification. We validate our approach on 3 stochastic non-linear reinforcement learning tasks.

Keywords

General Medicine

Publishing Year

2023

Date Published

2023-06-26

Proceedings Title

Proceedings of the 37th AAAI Conference on Artificial Intelligence

Publisher

Association for the Advancement of Artificial Intelligence

Acknowledgement

This work was supported in part by the ERC-2020-AdG 101020093, ERC CoG 863818 (FoRM-SMArt) and the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie Grant Agreement No. 665385.

Volume

Issue

Page

11926-11935

Conference

AAAI: Conference on Artificial Intelligence

Conference Location

Washington, DC, United States

Conference Date

2023-02-07 – 2023-02-14

ISSN

2159-5399

eISSN

2374-3468

IST-REx-ID

14830

Cite this

Zikelic D, Lechner M, Henzinger TA, Chatterjee K. Learning control policies for stochastic systems with reach-avoid guarantees. In: Proceedings of the 37th AAAI Conference on Artificial Intelligence. Vol 37. Association for the Advancement of Artificial Intelligence; 2023:11926-11935. doi:10.1609/aaai.v37i10.26407

Zikelic, D., Lechner, M., Henzinger, T. A., & Chatterjee, K. (2023). Learning control policies for stochastic systems with reach-avoid guarantees. In Proceedings of the 37th AAAI Conference on Artificial Intelligence (Vol. 37, pp. 11926–11935). Washington, DC, United States: Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v37i10.26407

Zikelic, Dorde, Mathias Lechner, Thomas A Henzinger, and Krishnendu Chatterjee. “Learning Control Policies for Stochastic Systems with Reach-Avoid Guarantees.” In Proceedings of the 37th AAAI Conference on Artificial Intelligence, 37:11926–35. Association for the Advancement of Artificial Intelligence, 2023. https://doi.org/10.1609/aaai.v37i10.26407.

D. Zikelic, M. Lechner, T. A. Henzinger, and K. Chatterjee, “Learning control policies for stochastic systems with reach-avoid guarantees,” in Proceedings of the 37th AAAI Conference on Artificial Intelligence, Washington, DC, United States, 2023, vol. 37, no. 10, pp. 11926–11935.

Zikelic, Dorde, et al. “Learning Control Policies for Stochastic Systems with Reach-Avoid Guarantees.” Proceedings of the 37th AAAI Conference on Artificial Intelligence, vol. 37, no. 10, Association for the Advancement of Artificial Intelligence, 2023, pp. 11926–35, doi:10.1609/aaai.v37i10.26407.

All files available under the following license(s):

Copyright Statement: