Run-time optimization for learned controllers through quantitative games

Avni, Guy; Bloem, Roderick; Chatterjee, Krishnendu; Henzinger, Thomas A; Konighofer, Bettina; Pranger, Stefan

Run-time optimization for learned controllers through quantitative games

Avni G, Bloem R, Chatterjee K, Henzinger TA, Konighofer B, Pranger S. 2019. Run-time optimization for learned controllers through quantitative games. 31st International Conference on Computer-Aided Verification. CAV: Computer Aided Verification, LNCS, vol. 11561, 630–649.

Download

2019_CAV_Avni.pdf 659.77 KB [Published Version]

DOI

10.1007/978-3-030-25540-4_36

Conference Paper | Published | English

Scopus indexed

Author

Avni, Guy^ISTA ; Bloem, Roderick; Chatterjee, Krishnendu^ISTA ; Henzinger, Thomas A^ISTA ; Konighofer, Bettina; Pranger, Stefan

Corresponding author has ISTA affiliation

Department

Henzinger_Thomas Group
Chatterjee Group

Grant

Formal Methods meets Algorithmic Game Theory
Formal methods for the design and analysis of complex systems
Rigorous Systems Engineering

Series Title

LNCS

Abstract

A controller is a device that interacts with a plant. At each time point,it reads the plant’s state and issues commands with the goal that the plant oper-ates optimally. Constructing optimal controllers is a fundamental and challengingproblem. Machine learning techniques have recently been successfully applied totrain controllers, yet they have limitations. Learned controllers are monolithic andhard to reason about. In particular, it is difficult to add features without retraining,to guarantee any level of performance, and to achieve acceptable performancewhen encountering untrained scenarios. These limitations can be addressed bydeploying quantitative run-timeshieldsthat serve as a proxy for the controller.At each time point, the shield reads the command issued by the controller andmay choose to alter it before passing it on to the plant. We show how optimalshields that interfere as little as possible while guaranteeing a desired level ofcontroller performance, can be generated systematically and automatically usingreactive synthesis. First, we abstract the plant by building a stochastic model.Second, we consider the learned controller to be a black box. Third, we mea-surecontroller performanceandshield interferenceby two quantitative run-timemeasures that are formally defined using weighted automata. Then, the problemof constructing a shield that guarantees maximal performance with minimal inter-ference is the problem of finding an optimal strategy in a stochastic2-player game“controller versus shield” played on the abstract state space of the plant with aquantitative objective obtained from combining the performance and interferencemeasures. We illustrate the effectiveness of our approach by automatically con-structing lightweight shields for learned traffic-light controllers in various roadnetworks. The shields we generate avoid liveness bugs, improve controller per-formance in untrained and changing traffic situations, and add features to learnedcontrollers, such as giving priority to emergency vehicles.

Publishing Year

2019

Date Published

2019-07-12

Proceedings Title

31st International Conference on Computer-Aided Verification

Publisher

Springer

Volume

11561

Page

630-649

Conference

CAV: Computer Aided Verification

Conference Location

New York, NY, United States

Conference Date

2019-07-13 – 2019-07-18

ISBN

9783030255398

ISSN

0302-9743

IST-REx-ID

6462

Cite this

Avni G, Bloem R, Chatterjee K, Henzinger TA, Konighofer B, Pranger S. Run-time optimization for learned controllers through quantitative games. In: 31st International Conference on Computer-Aided Verification. Vol 11561. Springer; 2019:630-649. doi:10.1007/978-3-030-25540-4_36

Avni, G., Bloem, R., Chatterjee, K., Henzinger, T. A., Konighofer, B., & Pranger, S. (2019). Run-time optimization for learned controllers through quantitative games. In 31st International Conference on Computer-Aided Verification (Vol. 11561, pp. 630–649). New York, NY, United States: Springer. https://doi.org/10.1007/978-3-030-25540-4_36

Avni, Guy, Roderick Bloem, Krishnendu Chatterjee, Thomas A Henzinger, Bettina Konighofer, and Stefan Pranger. “Run-Time Optimization for Learned Controllers through Quantitative Games.” In 31st International Conference on Computer-Aided Verification, 11561:630–49. Springer, 2019. https://doi.org/10.1007/978-3-030-25540-4_36.

G. Avni, R. Bloem, K. Chatterjee, T. A. Henzinger, B. Konighofer, and S. Pranger, “Run-time optimization for learned controllers through quantitative games,” in 31st International Conference on Computer-Aided Verification, New York, NY, United States, 2019, vol. 11561, pp. 630–649.

Avni, Guy, et al. “Run-Time Optimization for Learned Controllers through Quantitative Games.” 31st International Conference on Computer-Aided Verification, vol. 11561, Springer, 2019, pp. 630–49, doi:10.1007/978-3-030-25540-4_36.

All files available under the following license(s):

Creative Commons Attribution 4.0 International Public License (CC-BY 4.0):