--- _id: '645' abstract: - lang: eng text: Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviours. Long-run average rewards provide a mathematically elegant formalism for expressing long term performance. Value iteration (VI) is one of the simplest and most efficient algorithmic approaches to MDPs with other properties, such as reachability objectives. Unfortunately, a naive extension of VI does not work for MDPs with long-run average rewards, as there is no known stopping criterion. In this work our contributions are threefold. (1) We refute a conjecture related to stopping criteria for MDPs with long-run average rewards. (2) We present two practical algorithms for MDPs with long-run average rewards based on VI. First, we show that a combination of applying VI locally for each maximal end-component (MEC) and VI for reachability objectives can provide approximation guarantees. Second, extending the above approach with a simulation-guided on-demand variant of VI, we present an anytime algorithm that is able to deal with very large models. (3) Finally, we present experimental results showing that our methods significantly outperform the standard approaches on several benchmarks. alternative_title: - LNCS author: - first_name: Pranav full_name: Ashok, Pranav last_name: Ashok - first_name: Krishnendu full_name: Chatterjee, Krishnendu id: 2E5DCA20-F248-11E8-B48F-1D18A9856A87 last_name: Chatterjee orcid: 0000-0002-4561-241X - first_name: Przemyslaw full_name: Daca, Przemyslaw id: 49351290-F248-11E8-B48F-1D18A9856A87 last_name: Daca - first_name: Jan full_name: Kretinsky, Jan id: 44CEF464-F248-11E8-B48F-1D18A9856A87 last_name: Kretinsky orcid: 0000-0002-8122-2881 - first_name: Tobias full_name: Meggendorfer, Tobias last_name: Meggendorfer citation: ama: 'Ashok P, Chatterjee K, Daca P, Kretinsky J, Meggendorfer T. Value iteration for long run average reward in markov decision processes. In: Majumdar R, Kunčak V, eds. Vol 10426. Springer; 2017:201-221. doi:10.1007/978-3-319-63387-9_10' apa: 'Ashok, P., Chatterjee, K., Daca, P., Kretinsky, J., & Meggendorfer, T. (2017). Value iteration for long run average reward in markov decision processes. In R. Majumdar & V. Kunčak (Eds.) (Vol. 10426, pp. 201–221). Presented at the CAV: Computer Aided Verification, Heidelberg, Germany: Springer. https://doi.org/10.1007/978-3-319-63387-9_10' chicago: Ashok, Pranav, Krishnendu Chatterjee, Przemyslaw Daca, Jan Kretinsky, and Tobias Meggendorfer. “Value Iteration for Long Run Average Reward in Markov Decision Processes.” edited by Rupak Majumdar and Viktor Kunčak, 10426:201–21. Springer, 2017. https://doi.org/10.1007/978-3-319-63387-9_10. ieee: 'P. Ashok, K. Chatterjee, P. Daca, J. Kretinsky, and T. Meggendorfer, “Value iteration for long run average reward in markov decision processes,” presented at the CAV: Computer Aided Verification, Heidelberg, Germany, 2017, vol. 10426, pp. 201–221.' ista: 'Ashok P, Chatterjee K, Daca P, Kretinsky J, Meggendorfer T. 2017. Value iteration for long run average reward in markov decision processes. CAV: Computer Aided Verification, LNCS, vol. 10426, 201–221.' mla: Ashok, Pranav, et al. Value Iteration for Long Run Average Reward in Markov Decision Processes. Edited by Rupak Majumdar and Viktor Kunčak, vol. 10426, Springer, 2017, pp. 201–21, doi:10.1007/978-3-319-63387-9_10. short: P. Ashok, K. Chatterjee, P. Daca, J. Kretinsky, T. Meggendorfer, in:, R. Majumdar, V. Kunčak (Eds.), Springer, 2017, pp. 201–221. conference: end_date: 2017-07-28 location: Heidelberg, Germany name: 'CAV: Computer Aided Verification' start_date: 2017-07-24 date_created: 2018-12-11T11:47:41Z date_published: 2017-07-13T00:00:00Z date_updated: 2021-01-12T08:07:32Z day: '13' department: - _id: KrCh doi: 10.1007/978-3-319-63387-9_10 ec_funded: 1 editor: - first_name: Rupak full_name: Majumdar, Rupak last_name: Majumdar - first_name: Viktor full_name: Kunčak, Viktor last_name: Kunčak intvolume: ' 10426' language: - iso: eng main_file_link: - open_access: '1' url: https://arxiv.org/abs/1705.02326 month: '07' oa: 1 oa_version: Submitted Version page: 201 - 221 project: - _id: 25892FC0-B435-11E9-9278-68D0E5697425 grant_number: ICT15-003 name: Efficient Algorithms for Computer Aided Verification - _id: 25863FF4-B435-11E9-9278-68D0E5697425 call_identifier: FWF grant_number: S11407 name: Game Theory - _id: 2581B60A-B435-11E9-9278-68D0E5697425 call_identifier: FP7 grant_number: '279307' name: 'Quantitative Graph Games: Theory and Applications' publication_identifier: isbn: - 978-331963386-2 publication_status: published publisher: Springer publist_id: '7135' quality_controlled: '1' scopus_import: 1 status: public title: Value iteration for long run average reward in markov decision processes type: conference user_id: 3E5EF7F0-F248-11E8-B48F-1D18A9856A87 volume: 10426 year: '2017' ...