{"oa":1,"license":"https://creativecommons.org/licenses/by-sa/4.0/","abstract":[{"text":"We study the problem of learning controllers for discrete-time non-linear stochastic dynamical systems with formal reach-avoid guarantees. This work presents the first method for providing formal reach-avoid guarantees, which combine and generalize stability and safety guarantees, with a tolerable probability threshold $p\\in[0,1]$ over the infinite time horizon. Our method leverages advances in machine learning literature and it represents formal certificates as neural networks. In particular, we learn a certificate in the form of a reach-avoid supermartingale (RASM), a novel notion that we introduce in this work. Our RASMs provide reachability and avoidance guarantees by imposing constraints on what can be viewed as a stochastic extension of level sets of Lyapunov functions for deterministic systems. Our approach solves several important problems -- it can be used to learn a control policy from scratch, to verify a reach-avoid specification for a fixed control policy, or to fine-tune a pre-trained policy if it does not satisfy the reach-avoid specification. We validate our approach on $3$ stochastic non-linear reinforcement learning tasks.","lang":"eng"}],"year":"2022","external_id":{"arxiv":["2210.05308"]},"date_created":"2023-11-24T13:10:09Z","month":"11","publication_status":"submitted","main_file_link":[{"url":"https://arxiv.org/abs/2210.05308","open_access":"1"}],"author":[{"first_name":"Dorde","orcid":"0000-0002-4681-1699","last_name":"Zikelic","full_name":"Zikelic, Dorde","id":"294AA7A6-F248-11E8-B48F-1D18A9856A87"},{"id":"3DC22916-F248-11E8-B48F-1D18A9856A87","full_name":"Lechner, Mathias","last_name":"Lechner","first_name":"Mathias"},{"first_name":"Thomas A","orcid":"0000-0002-2985-7724","last_name":"Henzinger","full_name":"Henzinger, Thomas A","id":"40876CD8-F248-11E8-B48F-1D18A9856A87"},{"id":"2E5DCA20-F248-11E8-B48F-1D18A9856A87","full_name":"Chatterjee, Krishnendu","last_name":"Chatterjee","orcid":"0000-0002-4561-241X","first_name":"Krishnendu"}],"day":"29","language":[{"iso":"eng"}],"publication":"arXiv","project":[{"name":"Formal Methods for Stochastic Models: Algorithms and Applications","_id":"0599E47C-7A3F-11EA-A408-12923DDC885E","grant_number":"863818","call_identifier":"H2020"},{"_id":"62781420-2b32-11ec-9570-8d9b63373d4d","name":"Vigilant Algorithmic Monitoring of Software","grant_number":"101020093","call_identifier":"H2020"},{"_id":"2564DBCA-B435-11E9-9278-68D0E5697425","name":"International IST Doctoral Program","call_identifier":"H2020","grant_number":"665385"}],"oa_version":"Preprint","department":[{"_id":"KrCh"},{"_id":"ToHe"}],"doi":"10.48550/ARXIV.2210.05308","type":"preprint","date_updated":"2024-01-22T14:08:29Z","tmp":{"short":"CC BY-SA (4.0)","image":"/images/cc_by_sa.png","name":"Creative Commons Attribution-ShareAlike 4.0 International Public License (CC BY-SA 4.0)","legal_code_url":"https://creativecommons.org/licenses/by-sa/4.0/legalcode"},"user_id":"8b945eb4-e2f2-11eb-945a-df72226e66a9","related_material":{"record":[{"id":"14539","relation":"dissertation_contains","status":"public"},{"status":"public","relation":"later_version","id":"14830"}]},"title":"Learning control policies for stochastic systems with reach-avoid guarantees","citation":{"ieee":"D. Zikelic, M. Lechner, T. A. Henzinger, and K. Chatterjee, “Learning control policies for stochastic systems with reach-avoid guarantees,” arXiv. .","apa":"Zikelic, D., Lechner, M., Henzinger, T. A., & Chatterjee, K. (n.d.). Learning control policies for stochastic systems with reach-avoid guarantees. arXiv. https://doi.org/10.48550/ARXIV.2210.05308","ama":"Zikelic D, Lechner M, Henzinger TA, Chatterjee K. Learning control policies for stochastic systems with reach-avoid guarantees. arXiv. doi:10.48550/ARXIV.2210.05308","chicago":"Zikelic, Dorde, Mathias Lechner, Thomas A Henzinger, and Krishnendu Chatterjee. “Learning Control Policies for Stochastic Systems with Reach-Avoid Guarantees.” ArXiv, n.d. https://doi.org/10.48550/ARXIV.2210.05308.","short":"D. Zikelic, M. Lechner, T.A. Henzinger, K. Chatterjee, ArXiv (n.d.).","mla":"Zikelic, Dorde, et al. “Learning Control Policies for Stochastic Systems with Reach-Avoid Guarantees.” ArXiv, doi:10.48550/ARXIV.2210.05308.","ista":"Zikelic D, Lechner M, Henzinger TA, Chatterjee K. Learning control policies for stochastic systems with reach-avoid guarantees. arXiv, 10.48550/ARXIV.2210.05308."},"article_processing_charge":"No","ec_funded":1,"_id":"14600","date_published":"2022-11-29T00:00:00Z","status":"public"}