{"publication_status":"submitted","oa":1,"date_published":"2022-11-29T00:00:00Z","article_processing_charge":"No","month":"11","user_id":"8b945eb4-e2f2-11eb-945a-df72226e66a9","related_material":{"record":[{"status":"public","relation":"dissertation_contains","id":"14539"},{"status":"public","id":"14830","relation":"later_version"}]},"department":[{"_id":"KrCh"},{"_id":"ToHe"}],"oa_version":"Preprint","license":"https://creativecommons.org/licenses/by-sa/4.0/","year":"2022","author":[{"last_name":"Zikelic","first_name":"Dorde","orcid":"0000-0002-4681-1699","full_name":"Zikelic, Dorde","id":"294AA7A6-F248-11E8-B48F-1D18A9856A87"},{"last_name":"Lechner","id":"3DC22916-F248-11E8-B48F-1D18A9856A87","full_name":"Lechner, Mathias","first_name":"Mathias"},{"orcid":"0000-0002-2985-7724","id":"40876CD8-F248-11E8-B48F-1D18A9856A87","full_name":"Henzinger, Thomas A","first_name":"Thomas A","last_name":"Henzinger"},{"last_name":"Chatterjee","first_name":"Krishnendu","full_name":"Chatterjee, Krishnendu","id":"2E5DCA20-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0002-4561-241X"}],"abstract":[{"lang":"eng","text":"We study the problem of learning controllers for discrete-time non-linear stochastic dynamical systems with formal reach-avoid guarantees. This work presents the first method for providing formal reach-avoid guarantees, which combine and generalize stability and safety guarantees, with a tolerable probability threshold $p\\in[0,1]$ over the infinite time horizon. Our method leverages advances in machine learning literature and it represents formal certificates as neural networks. In particular, we learn a certificate in the form of a reach-avoid supermartingale (RASM), a novel notion that we introduce in this work. Our RASMs provide reachability and avoidance guarantees by imposing constraints on what can be viewed as a stochastic extension of level sets of Lyapunov functions for deterministic systems. Our approach solves several important problems -- it can be used to learn a control policy from scratch, to verify a reach-avoid specification for a fixed control policy, or to fine-tune a pre-trained policy if it does not satisfy the reach-avoid specification. We validate our approach on $3$ stochastic non-linear reinforcement learning tasks."}],"citation":{"ama":"Zikelic D, Lechner M, Henzinger TA, Chatterjee K. Learning control policies for stochastic systems with reach-avoid guarantees. <i>arXiv</i>. doi:<a href=\"https://doi.org/10.48550/ARXIV.2210.05308\">10.48550/ARXIV.2210.05308</a>","ieee":"D. Zikelic, M. Lechner, T. A. Henzinger, and K. Chatterjee, “Learning control policies for stochastic systems with reach-avoid guarantees,” <i>arXiv</i>. .","apa":"Zikelic, D., Lechner, M., Henzinger, T. A., &#38; Chatterjee, K. (n.d.). Learning control policies for stochastic systems with reach-avoid guarantees. <i>arXiv</i>. <a href=\"https://doi.org/10.48550/ARXIV.2210.05308\">https://doi.org/10.48550/ARXIV.2210.05308</a>","mla":"Zikelic, Dorde, et al. “Learning Control Policies for Stochastic Systems with Reach-Avoid Guarantees.” <i>ArXiv</i>, doi:<a href=\"https://doi.org/10.48550/ARXIV.2210.05308\">10.48550/ARXIV.2210.05308</a>.","short":"D. Zikelic, M. Lechner, T.A. Henzinger, K. Chatterjee, ArXiv (n.d.).","ista":"Zikelic D, Lechner M, Henzinger TA, Chatterjee K. Learning control policies for stochastic systems with reach-avoid guarantees. arXiv, <a href=\"https://doi.org/10.48550/ARXIV.2210.05308\">10.48550/ARXIV.2210.05308</a>.","chicago":"Zikelic, Dorde, Mathias Lechner, Thomas A Henzinger, and Krishnendu Chatterjee. “Learning Control Policies for Stochastic Systems with Reach-Avoid Guarantees.” <i>ArXiv</i>, n.d. <a href=\"https://doi.org/10.48550/ARXIV.2210.05308\">https://doi.org/10.48550/ARXIV.2210.05308</a>."},"doi":"10.48550/ARXIV.2210.05308","project":[{"grant_number":"863818","_id":"0599E47C-7A3F-11EA-A408-12923DDC885E","name":"Formal Methods for Stochastic Models: Algorithms and Applications","call_identifier":"H2020"},{"_id":"62781420-2b32-11ec-9570-8d9b63373d4d","grant_number":"101020093","name":"Vigilant Algorithmic Monitoring of Software","call_identifier":"H2020"},{"name":"International IST Doctoral Program","call_identifier":"H2020","_id":"2564DBCA-B435-11E9-9278-68D0E5697425","grant_number":"665385"}],"type":"preprint","title":"Learning control policies for stochastic systems with reach-avoid guarantees","external_id":{"arxiv":["2210.05308"]},"language":[{"iso":"eng"}],"main_file_link":[{"open_access":"1","url":"https://arxiv.org/abs/2210.05308"}],"date_created":"2023-11-24T13:10:09Z","date_updated":"2024-01-22T14:08:29Z","tmp":{"image":"/images/cc_by_sa.png","short":"CC BY-SA (4.0)","name":"Creative Commons Attribution-ShareAlike 4.0 International Public License (CC BY-SA 4.0)","legal_code_url":"https://creativecommons.org/licenses/by-sa/4.0/legalcode"},"status":"public","day":"29","ec_funded":1,"_id":"14600","publication":"arXiv"}