{"user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","volume":37,"year":"2023","department":[{"_id":"ToHe"},{"_id":"KrCh"}],"abstract":[{"lang":"eng","text":"We study the problem of learning controllers for discrete-time non-linear stochastic dynamical systems with formal reach-avoid guarantees. This work presents the first method for providing formal reach-avoid guarantees, which combine and generalize stability and safety guarantees, with a tolerable probability threshold p in [0,1] over the infinite time horizon. Our method leverages advances in machine learning literature and it represents formal certificates as neural networks. In particular, we learn a certificate in the form of a reach-avoid supermartingale (RASM), a novel notion that we introduce in this work. Our RASMs provide reachability and avoidance guarantees by imposing constraints on what can be viewed as a stochastic extension of level sets of Lyapunov functions for deterministic systems. Our approach solves several important problems -- it can be used to learn a control policy from scratch, to verify a reach-avoid specification for a fixed control policy, or to fine-tune a pre-trained policy if it does not satisfy the reach-avoid specification. We validate our approach on 3 stochastic non-linear reinforcement learning tasks."}],"quality_controlled":"1","issue":"10","project":[{"_id":"62781420-2b32-11ec-9570-8d9b63373d4d","grant_number":"101020093","call_identifier":"H2020","name":"Vigilant Algorithmic Monitoring of Software"},{"grant_number":"863818","_id":"0599E47C-7A3F-11EA-A408-12923DDC885E","call_identifier":"H2020","name":"Formal Methods for Stochastic Models: Algorithms and Applications"},{"name":"International IST Doctoral Program","call_identifier":"H2020","grant_number":"665385","_id":"2564DBCA-B435-11E9-9278-68D0E5697425"}],"type":"conference","intvolume":"        37","title":"Learning control policies for stochastic systems with reach-avoid guarantees","doi":"10.1609/aaai.v37i10.26407","citation":{"ieee":"D. Zikelic, M. Lechner, T. A. Henzinger, and K. Chatterjee, “Learning control policies for stochastic systems with reach-avoid guarantees,” in <i>Proceedings of the 37th AAAI Conference on Artificial Intelligence</i>, Washington, DC, United States, 2023, vol. 37, no. 10, pp. 11926–11935.","ama":"Zikelic D, Lechner M, Henzinger TA, Chatterjee K. Learning control policies for stochastic systems with reach-avoid guarantees. In: <i>Proceedings of the 37th AAAI Conference on Artificial Intelligence</i>. Vol 37. Association for the Advancement of Artificial Intelligence; 2023:11926-11935. doi:<a href=\"https://doi.org/10.1609/aaai.v37i10.26407\">10.1609/aaai.v37i10.26407</a>","apa":"Zikelic, D., Lechner, M., Henzinger, T. A., &#38; Chatterjee, K. (2023). Learning control policies for stochastic systems with reach-avoid guarantees. In <i>Proceedings of the 37th AAAI Conference on Artificial Intelligence</i> (Vol. 37, pp. 11926–11935). Washington, DC, United States: Association for the Advancement of Artificial Intelligence. <a href=\"https://doi.org/10.1609/aaai.v37i10.26407\">https://doi.org/10.1609/aaai.v37i10.26407</a>","mla":"Zikelic, Dorde, et al. “Learning Control Policies for Stochastic Systems with Reach-Avoid Guarantees.” <i>Proceedings of the 37th AAAI Conference on Artificial Intelligence</i>, vol. 37, no. 10, Association for the Advancement of Artificial Intelligence, 2023, pp. 11926–35, doi:<a href=\"https://doi.org/10.1609/aaai.v37i10.26407\">10.1609/aaai.v37i10.26407</a>.","chicago":"Zikelic, Dorde, Mathias Lechner, Thomas A Henzinger, and Krishnendu Chatterjee. “Learning Control Policies for Stochastic Systems with Reach-Avoid Guarantees.” In <i>Proceedings of the 37th AAAI Conference on Artificial Intelligence</i>, 37:11926–35. Association for the Advancement of Artificial Intelligence, 2023. <a href=\"https://doi.org/10.1609/aaai.v37i10.26407\">https://doi.org/10.1609/aaai.v37i10.26407</a>.","short":"D. Zikelic, M. Lechner, T.A. Henzinger, K. Chatterjee, in:, Proceedings of the 37th AAAI Conference on Artificial Intelligence, Association for the Advancement of Artificial Intelligence, 2023, pp. 11926–11935.","ista":"Zikelic D, Lechner M, Henzinger TA, Chatterjee K. 2023. Learning control policies for stochastic systems with reach-avoid guarantees. Proceedings of the 37th AAAI Conference on Artificial Intelligence. AAAI: Conference on Artificial Intelligence vol. 37, 11926–11935."},"conference":{"start_date":"2023-02-07","name":"AAAI: Conference on Artificial Intelligence","location":"Washington, DC, United States","end_date":"2023-02-14"},"acknowledgement":"This work was supported in part by the ERC-2020-AdG 101020093, ERC CoG 863818 (FoRM-SMArt) and the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie Grant Agreement No. 665385.","date_updated":"2024-01-22T14:08:29Z","ec_funded":1,"date_published":"2023-06-26T00:00:00Z","article_processing_charge":"No","publication_status":"published","publication_identifier":{"issn":["2159-5399"],"eissn":["2374-3468"]},"month":"06","related_material":{"record":[{"relation":"earlier_version","id":"14600","status":"public"}]},"oa_version":"Preprint","keyword":["General Medicine"],"publisher":"Association for the Advancement of Artificial Intelligence","author":[{"first_name":"Dorde","full_name":"Zikelic, Dorde","id":"294AA7A6-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0002-4681-1699","last_name":"Zikelic"},{"full_name":"Lechner, Mathias","id":"3DC22916-F248-11E8-B48F-1D18A9856A87","first_name":"Mathias","last_name":"Lechner"},{"last_name":"Henzinger","orcid":"0000-0002-2985-7724","id":"40876CD8-F248-11E8-B48F-1D18A9856A87","full_name":"Henzinger, Thomas A","first_name":"Thomas A"},{"last_name":"Chatterjee","first_name":"Krishnendu","orcid":"0000-0002-4561-241X","id":"2E5DCA20-F248-11E8-B48F-1D18A9856A87","full_name":"Chatterjee, Krishnendu"}],"external_id":{"arxiv":["2210.05308"]},"language":[{"iso":"eng"}],"page":"11926-11935","date_created":"2024-01-18T07:44:31Z","_id":"14830","publication":"Proceedings of the 37th AAAI Conference on Artificial Intelligence","status":"public","day":"26"}