Aerial View of the George Mason Fairfax Campus

CSIS Seminar

Reinforcement Learning for Safety-Critical Systems

Speaker:		Enrique Mallada, Johns Hopkins University
When:		February 16, 2024, 11:00 am - 12:00 pm
Where:		ENGR 4201

Abstract

Integrating Reinforcement Learning (RL) in safety-critical applications, such as autonomous vehicles, healthcare, and industrial automation, necessitates an increased focus on safety and reliability. In this talk, we consider two complementary mechanisms to augment RL's suitability for safety-critical systems. Firstly, we consider a constrained reinforcement learning (C-RL) setting, wherein agents aim to maximize rewards while adhering to required constraints on secondary specifications. Several algorithms rooted in sampled-based primal-dual methods have been recently proposed to solve this problem in policy space. However, such methods exhibit a discrepancy between the behavioral and optimal policies due to their reliance on stochastic gradient descent-ascent algorithms. We propose a novel algorithm for constrained RL that does not suffer from these limitations. Leveraging recent results on regularized saddle-flow dynamics, we develop a novel stochastic gradient descent-ascent algorithm whose trajectories almost surely converge to the optimal policy. Secondly, we study the problem of incorporating safety-critical constraints to RL that allow an agent to avoid (unsafe) regions of the state space. Though such a safety goal can be captured by an action-value-like function, a.k.a. safety critics, the associated operator lacks the desired contraction and uniqueness properties that the classical Bellman operator enjoys. In this work, we overcome the non-contractiveness of safety critic operators by leveraging that safety is a binary property. To that end, we study the properties of the binary safety critic associated with a deterministic dynamical system that seeks to avoid reaching an unsafe region. We formulate the corresponding binary Bellman equation (B2E) for safety and study its properties. While the resulting operator is still non-contractive, we fully characterize its fixed points representing--except for a spurious solution--maximal persistently safe regions of the state space that can always avoid failure. We provide an algorithm that, by design, leverages axiomatic knowledge of safe data to avoid spurious fixed points.

Speaker Bio

Enrique Mallada has been an associate professor of electrical and computer engineering at Johns Hopkins University since 2022. Before joining Hopkins in 2016 as an assistant professor, he was a post-doctoral fellow at the Center for the Mathematics of Information at the California Institute of Technology from 2014 to 2016. He received his telecommunications engineering degree from Universidad ORT, Uruguay, in 2005 and his Ph.D. degree in electrical and computer engineering with a minor in applied mathematics from Cornell University in 2014. Dr. Mallada was awarded the Johns Hopkins Alumni Association Excellence in Teaching award in 2021, the NSF CAREER award in 2018, the ECE Director's Ph.D. Thesis Research Award for his dissertation in 2014, the Cornell University's Jacobs Fellowship in 2011 and the Organization of American States scholarship from 2008 to 2010. His research interests lie in the areas of control, dynamical systems, optimization, and machine learning, with applications to infrastructure networks and autonomous systems.

Center for Secure Information Systems

Securing the World's Cyber Infrastructure

George Mason University

CSIS Seminar

Reinforcement Learning for Safety-Critical Systems

Abstract

Speaker Bio