EMPERROR: A Flexible Generative Perception Error Model for Probing Self-Driving Planners

Abstract

To handle the complexities of real-world traffic, learning planners for self-driving from data is a promising direction. While recent approaches have shown great progress, they typically assume a setting in which the ground-truth world state is available as input. However, when deployed, planning needs to be robust to the long-tail of errors incurred by a noisy perception system, which is often neglected in evaluation. To address this, previous work has proposed drawing adversarial samples from a perception error model (PEM) mimicking the noise characteristics of a target object detector. However, these methods use simple PEMs that fail to accurately capture all failure modes of detection. In this paper, we present EMPERROR, a novel transformer-based generative PEM, apply it to stress-test an imitation learning (IL)-based planner and show that it imitates modern detectors more faithfully than previous work. Furthermore, it is able to produce realistic noisy inputs that increase the planner’s collision rate by up to 85 %, demonstrating its utility as a valuable tool for a more complete evaluation of self-driving planners.

IL-based planning is brittle to small, plausible perception errors

By drawing samples from EMPERROR in adversarial fashion, we can stress-test a given planning module. Here we highlight the brittleness of an imitation learning-based planner by searching for proxy detection results that induce risky plans ending in collision. We show the ground-truth state in red, and proxy detection results in blue. Move the slider to compare the nominal, and adversarial scene.

EMPERROR can faithfully imitate modern 3D object detectors

To further highlight the plausibility of proxy detections generated by EMPERROR, we visualize its results over time for three different target detectors and compare to a PEM predicting a per-object Gaussian error distribution using an MLP. Even when drawing independent samples at each timestep, the error patterns produced by EMPERROR are consistent accross the sequence and closely match the target object detector. This is not the case for the baseline.

BibTeX citation

    @Article{hanselmann2025emperror,
  title = {EMPERROR: A Flexible Generative Perception Error Model for Probing Self-Driving Planners},
  author = {Hanselmann, Niklas and Doll, Simon and Cordts, Marius and Lensch, Hendrik PA and Geiger, Andreas},
  journal = {IEEE Robotics and Automation Letters (RA-L)},
  year = {2025},
  doi={10.1109/LRA.2025.3562789}
}