Markov decision processes with multiple long-run average objectives

Brázdil T, Brožek V, Chatterjee K, Forejt V, Kučera A. 2014. Markov decision processes with multiple long-run average objectives. Logical Methods in Computer Science. 10(1).

Download
OA IST-2016-428-v1+1_1104.3489.pdf 375.39 KB [Published Version]
Download (ext.)

Journal Article | Published | English

Scopus indexed
Author
Brázdil, Tomáš; Brožek, Václav; Chatterjee, KrishnenduISTA ; Forejt, Vojtěch; Kučera, Antonín
Department
Abstract
We study Markov decision processes (MDPs) with multiple limit-average (or mean-payoff) functions. We consider two different objectives, namely, expectation and satisfaction objectives. Given an MDP with κ limit-average functions, in the expectation objective the goal is to maximize the expected limit-average value, and in the satisfaction objective the goal is to maximize the probability of runs such that the limit-average value stays above a given vector. We show that under the expectation objective, in contrast to the case of one limit-average function, both randomization and memory are necessary for strategies even for ε-approximation, and that finite-memory randomized strategies are sufficient for achieving Pareto optimal values. Under the satisfaction objective, in contrast to the case of one limit-average function, infinite memory is necessary for strategies achieving a specific value (i.e. randomized finite-memory strategies are not sufficient), whereas memoryless randomized strategies are sufficient for ε-approximation, for all ε > 0. We further prove that the decision problems for both expectation and satisfaction objectives can be solved in polynomial time and the trade-off curve (Pareto curve) can be ε-approximated in time polynomial in the size of the MDP and 1/ε, and exponential in the number of limit-average functions, for all ε > 0. Our analysis also reveals flaws in previous work for MDPs with multiple mean-payoff functions under the expectation objective, corrects the flaws, and allows us to obtain improved results.
Publishing Year
Date Published
2014-02-14
Journal Title
Logical Methods in Computer Science
Publisher
International Federation of Computational Logic
Volume
10
Issue
1
ISSN
IST-REx-ID

Cite this

Brázdil T, Brožek V, Chatterjee K, Forejt V, Kučera A. Markov decision processes with multiple long-run average objectives. Logical Methods in Computer Science. 2014;10(1). doi:10.2168/LMCS-10(1:13)2014
Brázdil, T., Brožek, V., Chatterjee, K., Forejt, V., & Kučera, A. (2014). Markov decision processes with multiple long-run average objectives. Logical Methods in Computer Science. International Federation of Computational Logic. https://doi.org/10.2168/LMCS-10(1:13)2014
Brázdil, Tomáš, Václav Brožek, Krishnendu Chatterjee, Vojtěch Forejt, and Antonín Kučera. “Markov Decision Processes with Multiple Long-Run Average Objectives.” Logical Methods in Computer Science. International Federation of Computational Logic, 2014. https://doi.org/10.2168/LMCS-10(1:13)2014.
T. Brázdil, V. Brožek, K. Chatterjee, V. Forejt, and A. Kučera, “Markov decision processes with multiple long-run average objectives,” Logical Methods in Computer Science, vol. 10, no. 1. International Federation of Computational Logic, 2014.
Brázdil T, Brožek V, Chatterjee K, Forejt V, Kučera A. 2014. Markov decision processes with multiple long-run average objectives. Logical Methods in Computer Science. 10(1).
Brázdil, Tomáš, et al. “Markov Decision Processes with Multiple Long-Run Average Objectives.” Logical Methods in Computer Science, vol. 10, no. 1, International Federation of Computational Logic, 2014, doi:10.2168/LMCS-10(1:13)2014.
All files available under the following license(s):
Creative Commons Attribution 4.0 International Public License (CC-BY 4.0):
Main File(s)
Access Level
OA Open Access
Date Uploaded
2018-12-12
MD5 Checksum
803edcc2d8c1acfba44a9ec43a5eb9f0


Link(s) to Main File(s)
Access Level
OA Open Access

Export

Marked Publications

Open Data ISTA Research Explorer

Search this title in

Google Scholar