---
_id: '645'
abstract:
- lang: eng
text: Markov decision processes (MDPs) are standard models for probabilistic systems
with non-deterministic behaviours. Long-run average rewards provide a mathematically
elegant formalism for expressing long term performance. Value iteration (VI) is
one of the simplest and most efficient algorithmic approaches to MDPs with other
properties, such as reachability objectives. Unfortunately, a naive extension
of VI does not work for MDPs with long-run average rewards, as there is no known
stopping criterion. In this work our contributions are threefold. (1) We refute
a conjecture related to stopping criteria for MDPs with long-run average rewards.
(2) We present two practical algorithms for MDPs with long-run average rewards
based on VI. First, we show that a combination of applying VI locally for each
maximal end-component (MEC) and VI for reachability objectives can provide approximation
guarantees. Second, extending the above approach with a simulation-guided on-demand
variant of VI, we present an anytime algorithm that is able to deal with very
large models. (3) Finally, we present experimental results showing that our methods
significantly outperform the standard approaches on several benchmarks.
alternative_title:
- LNCS
author:
- first_name: Pranav
full_name: Ashok, Pranav
last_name: Ashok
- first_name: Krishnendu
full_name: Chatterjee, Krishnendu
id: 2E5DCA20-F248-11E8-B48F-1D18A9856A87
last_name: Chatterjee
orcid: 0000-0002-4561-241X
- first_name: Przemyslaw
full_name: Daca, Przemyslaw
id: 49351290-F248-11E8-B48F-1D18A9856A87
last_name: Daca
- first_name: Jan
full_name: Kretinsky, Jan
id: 44CEF464-F248-11E8-B48F-1D18A9856A87
last_name: Kretinsky
orcid: 0000-0002-8122-2881
- first_name: Tobias
full_name: Meggendorfer, Tobias
last_name: Meggendorfer
citation:
ama: 'Ashok P, Chatterjee K, Daca P, Kretinsky J, Meggendorfer T. Value iteration
for long run average reward in markov decision processes. In: Majumdar R, Kunčak
V, eds. Vol 10426. Springer; 2017:201-221. doi:10.1007/978-3-319-63387-9_10'
apa: 'Ashok, P., Chatterjee, K., Daca, P., Kretinsky, J., & Meggendorfer, T.
(2017). Value iteration for long run average reward in markov decision processes.
In R. Majumdar & V. Kunčak (Eds.) (Vol. 10426, pp. 201–221). Presented at
the CAV: Computer Aided Verification, Heidelberg, Germany: Springer. https://doi.org/10.1007/978-3-319-63387-9_10'
chicago: Ashok, Pranav, Krishnendu Chatterjee, Przemyslaw Daca, Jan Kretinsky, and
Tobias Meggendorfer. “Value Iteration for Long Run Average Reward in Markov Decision
Processes.” edited by Rupak Majumdar and Viktor Kunčak, 10426:201–21. Springer,
2017. https://doi.org/10.1007/978-3-319-63387-9_10.
ieee: 'P. Ashok, K. Chatterjee, P. Daca, J. Kretinsky, and T. Meggendorfer, “Value
iteration for long run average reward in markov decision processes,” presented
at the CAV: Computer Aided Verification, Heidelberg, Germany, 2017, vol. 10426,
pp. 201–221.'
ista: 'Ashok P, Chatterjee K, Daca P, Kretinsky J, Meggendorfer T. 2017. Value iteration
for long run average reward in markov decision processes. CAV: Computer Aided
Verification, LNCS, vol. 10426, 201–221.'
mla: Ashok, Pranav, et al. Value Iteration for Long Run Average Reward in Markov
Decision Processes. Edited by Rupak Majumdar and Viktor Kunčak, vol. 10426,
Springer, 2017, pp. 201–21, doi:10.1007/978-3-319-63387-9_10.
short: P. Ashok, K. Chatterjee, P. Daca, J. Kretinsky, T. Meggendorfer, in:, R.
Majumdar, V. Kunčak (Eds.), Springer, 2017, pp. 201–221.
conference:
end_date: 2017-07-28
location: Heidelberg, Germany
name: 'CAV: Computer Aided Verification'
start_date: 2017-07-24
date_created: 2018-12-11T11:47:41Z
date_published: 2017-07-13T00:00:00Z
date_updated: 2021-01-12T08:07:32Z
day: '13'
department:
- _id: KrCh
doi: 10.1007/978-3-319-63387-9_10
ec_funded: 1
editor:
- first_name: Rupak
full_name: Majumdar, Rupak
last_name: Majumdar
- first_name: Viktor
full_name: Kunčak, Viktor
last_name: Kunčak
intvolume: ' 10426'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://arxiv.org/abs/1705.02326
month: '07'
oa: 1
oa_version: Submitted Version
page: 201 - 221
project:
- _id: 25892FC0-B435-11E9-9278-68D0E5697425
grant_number: ICT15-003
name: Efficient Algorithms for Computer Aided Verification
- _id: 25863FF4-B435-11E9-9278-68D0E5697425
call_identifier: FWF
grant_number: S11407
name: Game Theory
- _id: 2581B60A-B435-11E9-9278-68D0E5697425
call_identifier: FP7
grant_number: '279307'
name: 'Quantitative Graph Games: Theory and Applications'
publication_identifier:
isbn:
- 978-331963386-2
publication_status: published
publisher: Springer
publist_id: '7135'
quality_controlled: '1'
scopus_import: 1
status: public
title: Value iteration for long run average reward in markov decision processes
type: conference
user_id: 3E5EF7F0-F248-11E8-B48F-1D18A9856A87
volume: 10426
year: '2017'
...