This paper introduces a minimal benchmark for testing whether an RL agent can learn a permanent safety constraint from a single catastrophic event.
The protocol uses standard MiniGrid LavaCrossing environments, fixed seeds, and forbids any training or gradient updates after the first failure. The key metric is whether the agent ever steps into lava again on unseen layouts.
A public benchmark harness is included so others can test their own agents under the same constraints.
This paper introduces a minimal benchmark for testing whether an RL agent can learn a permanent safety constraint from a single catastrophic event.
The protocol uses standard MiniGrid LavaCrossing environments, fixed seeds, and forbids any training or gradient updates after the first failure. The key metric is whether the agent ever steps into lava again on unseen layouts.
A public benchmark harness is included so others can test their own agents under the same constraints.