Value iteration for long run average reward in markov decision processes

Ashok, Pranav; Chatterjee, Krishnendu; Daca, Przemyslaw; Kretinsky, Jan; Meggendorfer, Tobias

Value iteration for long run average reward in markov decision processes

Ashok P, Chatterjee K, Daca P, Kretinsky J, Meggendorfer T. 2017. Value iteration for long run average reward in markov decision processes. CAV: Computer Aided Verification, LNCS, vol. 10426, 201–221.

Download (ext.)

https://arxiv.org/abs/1705.02326 [Submitted Version]

DOI

10.1007/978-3-319-63387-9_10

Conference Paper | Published | English

Scopus indexed

Author

Ashok, Pranav; Chatterjee, Krishnendu^ISTA ; Daca, Przemyslaw^ISTA; Kretinsky, Jan^ISTA ; Meggendorfer, Tobias

Editor

Majumdar, Rupak; Kunčak, Viktor

Department

Chatterjee Group

Grant

Efficient Algorithms for Computer Aided Verification
Game Theory
Quantitative Graph Games: Theory and Applications

Series Title

LNCS

Abstract

Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviours. Long-run average rewards provide a mathematically elegant formalism for expressing long term performance. Value iteration (VI) is one of the simplest and most efficient algorithmic approaches to MDPs with other properties, such as reachability objectives. Unfortunately, a naive extension of VI does not work for MDPs with long-run average rewards, as there is no known stopping criterion. In this work our contributions are threefold. (1) We refute a conjecture related to stopping criteria for MDPs with long-run average rewards. (2) We present two practical algorithms for MDPs with long-run average rewards based on VI. First, we show that a combination of applying VI locally for each maximal end-component (MEC) and VI for reachability objectives can provide approximation guarantees. Second, extending the above approach with a simulation-guided on-demand variant of VI, we present an anytime algorithm that is able to deal with very large models. (3) Finally, we present experimental results showing that our methods significantly outperform the standard approaches on several benchmarks.

Publishing Year

2017

Date Published

2017-07-13

Publisher

Springer

Volume

10426

Page

201 - 221

Conference

CAV: Computer Aided Verification

Conference Location

Heidelberg, Germany

Conference Date

2017-07-24 – 2017-07-28

ISBN

978-331963386-2

IST-REx-ID

645

Cite this

Ashok P, Chatterjee K, Daca P, Kretinsky J, Meggendorfer T. Value iteration for long run average reward in markov decision processes. In: Majumdar R, Kunčak V, eds. Vol 10426. Springer; 2017:201-221. doi:10.1007/978-3-319-63387-9_10

Ashok, P., Chatterjee, K., Daca, P., Kretinsky, J., & Meggendorfer, T. (2017). Value iteration for long run average reward in markov decision processes. In R. Majumdar & V. Kunčak (Eds.) (Vol. 10426, pp. 201–221). Presented at the CAV: Computer Aided Verification, Heidelberg, Germany: Springer. https://doi.org/10.1007/978-3-319-63387-9_10

Ashok, Pranav, Krishnendu Chatterjee, Przemyslaw Daca, Jan Kretinsky, and Tobias Meggendorfer. “Value Iteration for Long Run Average Reward in Markov Decision Processes.” edited by Rupak Majumdar and Viktor Kunčak, 10426:201–21. Springer, 2017. https://doi.org/10.1007/978-3-319-63387-9_10.

P. Ashok, K. Chatterjee, P. Daca, J. Kretinsky, and T. Meggendorfer, “Value iteration for long run average reward in markov decision processes,” presented at the CAV: Computer Aided Verification, Heidelberg, Germany, 2017, vol. 10426, pp. 201–221.

Ashok, Pranav, et al. Value Iteration for Long Run Average Reward in Markov Decision Processes. Edited by Rupak Majumdar and Viktor Kunčak, vol. 10426, Springer, 2017, pp. 201–21, doi:10.1007/978-3-319-63387-9_10.

All files available under the following license(s):

Copyright Statement: