---
_id: '645'
abstract:
- lang: eng
  text: Markov decision processes (MDPs) are standard models for probabilistic systems
    with non-deterministic behaviours. Long-run average rewards provide a mathematically
    elegant formalism for expressing long term performance. Value iteration (VI) is
    one of the simplest and most efficient algorithmic approaches to MDPs with other
    properties, such as reachability objectives. Unfortunately, a naive extension
    of VI does not work for MDPs with long-run average rewards, as there is no known
    stopping criterion. In this work our contributions are threefold. (1) We refute
    a conjecture related to stopping criteria for MDPs with long-run average rewards.
    (2) We present two practical algorithms for MDPs with long-run average rewards
    based on VI. First, we show that a combination of applying VI locally for each
    maximal end-component (MEC) and VI for reachability objectives can provide approximation
    guarantees. Second, extending the above approach with a simulation-guided on-demand
    variant of VI, we present an anytime algorithm that is able to deal with very
    large models. (3) Finally, we present experimental results showing that our methods
    significantly outperform the standard approaches on several benchmarks.
alternative_title:
- LNCS
author:
- first_name: Pranav
  full_name: Ashok, Pranav
  last_name: Ashok
- first_name: Krishnendu
  full_name: Chatterjee, Krishnendu
  id: 2E5DCA20-F248-11E8-B48F-1D18A9856A87
  last_name: Chatterjee
  orcid: 0000-0002-4561-241X
- first_name: Przemyslaw
  full_name: Daca, Przemyslaw
  id: 49351290-F248-11E8-B48F-1D18A9856A87
  last_name: Daca
- first_name: Jan
  full_name: Kretinsky, Jan
  id: 44CEF464-F248-11E8-B48F-1D18A9856A87
  last_name: Kretinsky
  orcid: 0000-0002-8122-2881
- first_name: Tobias
  full_name: Meggendorfer, Tobias
  last_name: Meggendorfer
citation:
  ama: 'Ashok P, Chatterjee K, Daca P, Kretinsky J, Meggendorfer T. Value iteration
    for long run average reward in markov decision processes. In: Majumdar R, Kunčak
    V, eds. Vol 10426. Springer; 2017:201-221. doi:<a href="https://doi.org/10.1007/978-3-319-63387-9_10">10.1007/978-3-319-63387-9_10</a>'
  apa: 'Ashok, P., Chatterjee, K., Daca, P., Kretinsky, J., &#38; Meggendorfer, T.
    (2017). Value iteration for long run average reward in markov decision processes.
    In R. Majumdar &#38; V. Kunčak (Eds.) (Vol. 10426, pp. 201–221). Presented at
    the CAV: Computer Aided Verification, Heidelberg, Germany: Springer. <a href="https://doi.org/10.1007/978-3-319-63387-9_10">https://doi.org/10.1007/978-3-319-63387-9_10</a>'
  chicago: Ashok, Pranav, Krishnendu Chatterjee, Przemyslaw Daca, Jan Kretinsky, and
    Tobias Meggendorfer. “Value Iteration for Long Run Average Reward in Markov Decision
    Processes.” edited by Rupak Majumdar and Viktor Kunčak, 10426:201–21. Springer,
    2017. <a href="https://doi.org/10.1007/978-3-319-63387-9_10">https://doi.org/10.1007/978-3-319-63387-9_10</a>.
  ieee: 'P. Ashok, K. Chatterjee, P. Daca, J. Kretinsky, and T. Meggendorfer, “Value
    iteration for long run average reward in markov decision processes,” presented
    at the CAV: Computer Aided Verification, Heidelberg, Germany, 2017, vol. 10426,
    pp. 201–221.'
  ista: 'Ashok P, Chatterjee K, Daca P, Kretinsky J, Meggendorfer T. 2017. Value iteration
    for long run average reward in markov decision processes. CAV: Computer Aided
    Verification, LNCS, vol. 10426, 201–221.'
  mla: Ashok, Pranav, et al. <i>Value Iteration for Long Run Average Reward in Markov
    Decision Processes</i>. Edited by Rupak Majumdar and Viktor Kunčak, vol. 10426,
    Springer, 2017, pp. 201–21, doi:<a href="https://doi.org/10.1007/978-3-319-63387-9_10">10.1007/978-3-319-63387-9_10</a>.
  short: P. Ashok, K. Chatterjee, P. Daca, J. Kretinsky, T. Meggendorfer, in:, R.
    Majumdar, V. Kunčak (Eds.), Springer, 2017, pp. 201–221.
conference:
  end_date: 2017-07-28
  location: Heidelberg, Germany
  name: 'CAV: Computer Aided Verification'
  start_date: 2017-07-24
date_created: 2018-12-11T11:47:41Z
date_published: 2017-07-13T00:00:00Z
date_updated: 2021-01-12T08:07:32Z
day: '13'
department:
- _id: KrCh
doi: 10.1007/978-3-319-63387-9_10
ec_funded: 1
editor:
- first_name: Rupak
  full_name: Majumdar, Rupak
  last_name: Majumdar
- first_name: Viktor
  full_name: Kunčak, Viktor
  last_name: Kunčak
intvolume: '     10426'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://arxiv.org/abs/1705.02326
month: '07'
oa: 1
oa_version: Submitted Version
page: 201 - 221
project:
- _id: 25892FC0-B435-11E9-9278-68D0E5697425
  grant_number: ICT15-003
  name: Efficient Algorithms for Computer Aided Verification
- _id: 25863FF4-B435-11E9-9278-68D0E5697425
  call_identifier: FWF
  grant_number: S11407
  name: Game Theory
- _id: 2581B60A-B435-11E9-9278-68D0E5697425
  call_identifier: FP7
  grant_number: '279307'
  name: 'Quantitative Graph Games: Theory and Applications'
publication_identifier:
  isbn:
  - 978-331963386-2
publication_status: published
publisher: Springer
publist_id: '7135'
quality_controlled: '1'
scopus_import: 1
status: public
title: Value iteration for long run average reward in markov decision processes
type: conference
user_id: 3E5EF7F0-F248-11E8-B48F-1D18A9856A87
volume: 10426
year: '2017'
...