Reinforcement learning of risk-constrained policies in Markov decision processes

Brázdil, Tomáš; Chatterjee, Krishnendu; Novotný, Petr; Vahala, Jiří

Reinforcement learning of risk-constrained policies in Markov decision processes

Brázdil T, Chatterjee K, Novotný P, Vahala J. 2020. Reinforcement learning of risk-constrained policies in Markov decision processes. Proceedings of the 34th AAAI Conference on Artificial Intelligence. 34(06), 9794–9801.

Download (ext.)

https://doi.org/10.48550/arXiv.2002.12086 [Preprint]

DOI

10.1609/aaai.v34i06.6531

Journal Article | Published | English

Author

Brázdil, Tomáš; Chatterjee, Krishnendu^ISTA ; Novotný, Petr; Vahala, Jiří

Department

Chatterjee Group

Grant

Game Theory

Abstract

Markov decision processes (MDPs) are the defacto framework for sequential decision making in the presence of stochastic uncertainty. A classical optimization criterion for MDPs is to maximize the expected discounted-sum payoff, which ignores low probability catastrophic events with highly negative impact on the system. On the other hand, risk-averse policies require the probability of undesirable events to be below a given threshold, but they do not account for optimization of the expected payoff. We consider MDPs with discounted-sum payoff with failure states which represent catastrophic outcomes. The objective of risk-constrained planning is to maximize the expected discounted-sum payoff among risk-averse policies that ensure the probability to encounter a failure state is below a desired threshold. Our main contribution is an efficient risk-constrained planning algorithm that combines UCT-like search with a predictor learned through interaction with the MDP (in the style of AlphaZero) and with a risk-constrained action selection via linear programming. We demonstrate the effectiveness of our approach with experiments on classical MDPs from the literature, including benchmarks with an order of 106 states.

Keywords

General Medicine

Publishing Year

2020

Date Published

2020-04-03

Journal Title

Proceedings of the 34th AAAI Conference on Artificial Intelligence

Publisher

Association for the Advancement of Artificial Intelligence

Acknowledgement

Krishnendu Chatterjee is supported by the Austrian Science Fund (FWF) NFN Grant No. S11407-N23 (RiSE/SHiNE), and COST Action GAMENET. Tomas Brazdil is supported by the Grant Agency of Masaryk University grant no. MUNI/G/0739/2017 and by the Czech Science Foundation grant No. 18-11193S. Petr Novotny and Jirı Vahala are supported by the Czech Science Foundation grant No. GJ19-15134Y.

Volume

Issue

Page

9794-9801

Conference

AAAI: Conference on Artificial Intelligence

Conference Location

New York, NY, United States

Conference Date

2020-02-07 – 2020-02-12

ISSN

2374-3468

IST-REx-ID

15055

Cite this

Brázdil T, Chatterjee K, Novotný P, Vahala J. Reinforcement learning of risk-constrained policies in Markov decision processes. Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020;34(06):9794-9801. doi:10.1609/aaai.v34i06.6531

Brázdil, T., Chatterjee, K., Novotný, P., & Vahala, J. (2020). Reinforcement learning of risk-constrained policies in Markov decision processes. Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, NY, United States: Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v34i06.6531

Brázdil, Tomáš, Krishnendu Chatterjee, Petr Novotný, and Jiří Vahala. “Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes.” Proceedings of the 34th AAAI Conference on Artificial Intelligence. Association for the Advancement of Artificial Intelligence, 2020. https://doi.org/10.1609/aaai.v34i06.6531.

T. Brázdil, K. Chatterjee, P. Novotný, and J. Vahala, “Reinforcement learning of risk-constrained policies in Markov decision processes,” Proceedings of the 34th AAAI Conference on Artificial Intelligence, vol. 34, no. 06. Association for the Advancement of Artificial Intelligence, pp. 9794–9801, 2020.

Brázdil, Tomáš, et al. “Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes.” Proceedings of the 34th AAAI Conference on Artificial Intelligence, vol. 34, no. 06, Association for the Advancement of Artificial Intelligence, 2020, pp. 9794–801, doi:10.1609/aaai.v34i06.6531.

All files available under the following license(s):

Copyright Statement:

This Item is protected by copyright and/or related rights. [...]

Link(s) to Main File(s)

URL

https://doi.org/10.48550/arXiv.2002.12086

Access Level

Open Access

Export

Marked Publications

Open Data ISTA Research Explorer

Sources

arXiv 2002.12086

Search this title in

Google Scholar