Two views on multiple mean payoff objectives in Markov Decision Processes

Brázdil, Tomáš; Brožek, Václav; Chatterjee, Krishnendu; Forejt, Vojtěch; Kučera, Antonín

Two views on multiple mean payoff objectives in Markov Decision Processes

Brázdil T, Brožek V, Chatterjee K, Forejt V, Kučera A. 2011. Two views on multiple mean payoff objectives in Markov Decision Processes. LICS: Logic in Computer Science, 5970225.

Download (ext.)

http://arxiv.org/abs/1104.3489 [Submitted Version]

DOI

10.1109/LICS.2011.10

Conference Paper | Published | English

Scopus indexed

Author

Brázdil, Tomáš; Brožek, Václav; Chatterjee, Krishnendu^ISTA ; Forejt, Vojtěch; Kučera, Antonín

Department

Chatterjee Group

Grant

Modern Graph Algorithmic Techniques in Formal Verification
Rigorous Systems Engineering
Quantitative Graph Games: Theory and Applications
Microsoft Research Faculty Fellowship

Abstract

We study Markov decision processes (MDPs) with multiple limit-average (or mean-payoff) functions. We consider two different objectives, namely, expectation and satisfaction objectives. Given an MDP with k reward functions, in the expectation objective the goal is to maximize the expected limit-average value, and in the satisfaction objective the goal is to maximize the probability of runs such that the limit-average value stays above a given vector. We show that under the expectation objective, in contrast to the single-objective case, both randomization and memory are necessary for strategies, and that finite-memory randomized strategies are sufficient. Under the satisfaction objective, in contrast to the single-objective case, infinite memory is necessary for strategies, and that randomized memoryless strategies are sufficient for epsilon-approximation, for all epsilon>;0. We further prove that the decision problems for both expectation and satisfaction objectives can be solved in polynomial time and the trade-off curve (Pareto curve) can be epsilon-approximated in time polynomial in the size of the MDP and 1/epsilon, and exponential in the number of reward functions, for all epsilon>;0. Our results also reveal flaws in previous work for MDPs with multiple mean-payoff functions under the expectation objective, correct the flaws and obtain improved results.

Publishing Year

2011

Date Published

2011-06-21

Publisher

IEEE

Article Number

5970225

Conference

LICS: Logic in Computer Science

Conference Location

Toronto, Canada

Conference Date

2011-06-21 – 2011-06-24

IST-REx-ID

3346

Cite this

Brázdil T, Brožek V, Chatterjee K, Forejt V, Kučera A. Two views on multiple mean payoff objectives in Markov Decision Processes. In: IEEE; 2011. doi:10.1109/LICS.2011.10

Brázdil, T., Brožek, V., Chatterjee, K., Forejt, V., & Kučera, A. (2011). Two views on multiple mean payoff objectives in Markov Decision Processes. Presented at the LICS: Logic in Computer Science, Toronto, Canada: IEEE. https://doi.org/10.1109/LICS.2011.10

Brázdil, Tomáš, Václav Brožek, Krishnendu Chatterjee, Vojtěch Forejt, and Antonín Kučera. “Two Views on Multiple Mean Payoff Objectives in Markov Decision Processes.” IEEE, 2011. https://doi.org/10.1109/LICS.2011.10.

T. Brázdil, V. Brožek, K. Chatterjee, V. Forejt, and A. Kučera, “Two views on multiple mean payoff objectives in Markov Decision Processes,” presented at the LICS: Logic in Computer Science, Toronto, Canada, 2011.

Brázdil T, Brožek V, Chatterjee K, Forejt V, Kučera A. 2011. Two views on multiple mean payoff objectives in Markov Decision Processes. LICS: Logic in Computer Science, 5970225.

Brázdil, Tomáš, et al. Two Views on Multiple Mean Payoff Objectives in Markov Decision Processes. 5970225, IEEE, 2011, doi:10.1109/LICS.2011.10.

All files available under the following license(s):

Copyright Statement: