Learning control policies for stochastic systems with reach-avoid guarantees

Zikelic, Dorde; Lechner, Mathias; Henzinger, Thomas A; Chatterjee, Krishnendu

Earlier Version

Learning control policies for stochastic systems with reach-avoid guarantees

Zikelic D, Lechner M, Henzinger TA, Chatterjee K. Learning control policies for stochastic systems with reach-avoid guarantees. arXiv, 2210.05308.

Download (ext.)

https://arxiv.org/abs/2210.05308 [Preprint]

DOI

10.48550/ARXIV.2210.05308

Preprint | Draft | English

Author

Zikelic, Djordje^ISTA ; Lechner, Mathias^ISTA; Henzinger, Thomas A^ISTA ; Chatterjee, Krishnendu^ISTA

Corresponding author has ISTA affiliation

Department

Chatterjee Group
Henzinger_Thomas Group

Grant

Formal Methods for Stochastic Models: Algorithms and Applications
Vigilant Algorithmic Monitoring of Software
International IST Doctoral Program

Abstract

We study the problem of learning controllers for discrete-time non-linear stochastic dynamical systems with formal reach-avoid guarantees. This work presents the first method for providing formal reach-avoid guarantees, which combine and generalize stability and safety guarantees, with a tolerable probability threshold $p\in[0,1]$ over the infinite time horizon. Our method leverages advances in machine learning literature and it represents formal certificates as neural networks. In particular, we learn a certificate in the form of a reach-avoid supermartingale (RASM), a novel notion that we introduce in this work. Our RASMs provide reachability and avoidance guarantees by imposing constraints on what can be viewed as a stochastic extension of level sets of Lyapunov functions for deterministic systems. Our approach solves several important problems -- it can be used to learn a control policy from scratch, to verify a reach-avoid specification for a fixed control policy, or to fine-tune a pre-trained policy if it does not satisfy the reach-avoid specification. We validate our approach on $3$ stochastic non-linear reinforcement learning tasks.

Publishing Year

2022

Date Published

2022-11-29

Journal Title

arXiv

Article Number

2210.05308

IST-REx-ID

14600

Cite this

Zikelic D, Lechner M, Henzinger TA, Chatterjee K. Learning control policies for stochastic systems with reach-avoid guarantees. arXiv. doi:10.48550/ARXIV.2210.05308

Zikelic, D., Lechner, M., Henzinger, T. A., & Chatterjee, K. (n.d.). Learning control policies for stochastic systems with reach-avoid guarantees. arXiv. https://doi.org/10.48550/ARXIV.2210.05308

Zikelic, Dorde, Mathias Lechner, Thomas A Henzinger, and Krishnendu Chatterjee. “Learning Control Policies for Stochastic Systems with Reach-Avoid Guarantees.” ArXiv, n.d. https://doi.org/10.48550/ARXIV.2210.05308.

D. Zikelic, M. Lechner, T. A. Henzinger, and K. Chatterjee, “Learning control policies for stochastic systems with reach-avoid guarantees,” arXiv. .

Zikelic D, Lechner M, Henzinger TA, Chatterjee K. Learning control policies for stochastic systems with reach-avoid guarantees. arXiv, 2210.05308.

Zikelic, Dorde, et al. “Learning Control Policies for Stochastic Systems with Reach-Avoid Guarantees.” ArXiv, 2210.05308, doi:10.48550/ARXIV.2210.05308.

All files available under the following license(s):

Creative Commons Attribution-ShareAlike 4.0 International Public License (CC BY-SA 4.0):