Multi-objective discounted reward verification in graphs and MDPs

Chatterjee, Krishnendu; Forejt, Vojtěch; Wojtczak, Dominik

Multi-objective discounted reward verification in graphs and MDPs

Chatterjee K, Forejt V, Wojtczak D. 2013. Multi-objective discounted reward verification in graphs and MDPs. 8312, 228–242.

Download

No fulltext has been uploaded. References only!

DOI

10.1007/978-3-642-45221-5_17

Conference Paper | Published | English

Scopus indexed

Author

Chatterjee, Krishnendu^ISTA ; Forejt, Vojtěch; Wojtczak, Dominik

Department

Chatterjee Group

Grant

Quantitative Graph Games: Theory and Applications

Series Title

LNCS

Abstract

We study the problem of achieving a given value in Markov decision processes (MDPs) with several independent discounted reward objectives. We consider a generalised version of discounted reward objectives, in which the amount of discounting depends on the states visited and on the objective. This definition extends the usual definition of discounted reward, and allows to capture the systems in which the value of different commodities diminish at different and variable rates. We establish results for two prominent subclasses of the problem, namely state-discount models where the discount factors are only dependent on the state of the MDP (and independent of the objective), and reward-discount models where they are only dependent on the objective (but not on the state of the MDP). For the state-discount models we use a straightforward reduction to expected total reward and show that the problem whether a value is achievable can be solved in polynomial time. For the reward-discount model we show that memory and randomisation of the strategies are required, but nevertheless that the problem is decidable and it is sufficient to consider strategies which after a certain number of steps behave in a memoryless way. For the general case, we show that when restricted to graphs (i.e. MDPs with no randomisation), pure strategies and discount factors of the form 1/n where n is an integer, the problem is in PSPACE and finite memory suffices for achieving a given value. We also show that when the discount factors are not of the form 1/n, the memory required by a strategy can be infinite.

Publishing Year

2013

Date Published

2013-12-01

Publisher

Springer

Volume

8312

Page

228 - 242

Conference

LPAR: Logic for Programming, Artificial Intelligence, and Reasoning

Conference Location

Stellenbosch, South Africa

Conference Date

2013-12-14 – 2013-12-19

IST-REx-ID

2238

Cite this

Chatterjee K, Forejt V, Wojtczak D. Multi-objective discounted reward verification in graphs and MDPs. 2013;8312:228-242. doi:10.1007/978-3-642-45221-5_17

Chatterjee, K., Forejt, V., & Wojtczak, D. (2013). Multi-objective discounted reward verification in graphs and MDPs. Presented at the LPAR: Logic for Programming, Artificial Intelligence, and Reasoning, Stellenbosch, South Africa: Springer. https://doi.org/10.1007/978-3-642-45221-5_17

Chatterjee, Krishnendu, Vojtěch Forejt, and Dominik Wojtczak. “Multi-Objective Discounted Reward Verification in Graphs and MDPs.” Lecture Notes in Computer Science. Springer, 2013. https://doi.org/10.1007/978-3-642-45221-5_17.

K. Chatterjee, V. Forejt, and D. Wojtczak, “Multi-objective discounted reward verification in graphs and MDPs,” vol. 8312. Springer, pp. 228–242, 2013.

Chatterjee K, Forejt V, Wojtczak D. 2013. Multi-objective discounted reward verification in graphs and MDPs. 8312, 228–242.

Chatterjee, Krishnendu, et al. Multi-Objective Discounted Reward Verification in Graphs and MDPs. Vol. 8312, Springer, 2013, pp. 228–42, doi:10.1007/978-3-642-45221-5_17.

Export

Marked Publications

Open Data ISTA Research Explorer

Search this title in

Google Scholar