“`html
The challenges of phantom jams
![]()
A stop-and-go wave moving backwards through highway traffic.
Stop-and-go waves are frustrating, seemingly inexplicable traffic slowdowns that can lead to congestion and significant energy waste. These waves often result from small fluctuations in driving behavior amplified by the flow of traffic. Traditional approaches like ramp metering and variable speed limits have limitations due to their need for costly infrastructure and centralized coordination. Using autonomous vehicles (AVs) with RL controllers provides a more scalable solution, as AVs can dynamically adjust their driving behavior to smooth out these waves.
Reinforcement learning for wave-smoothing AVs
RL is a powerful control approach where an agent learns to maximize a reward signal through interactions with an environment. In our case, the environment is a mixed-autonomy traffic scenario, where AVs learn driving strategies to dampen stop-and-go waves and reduce fuel consumption for both themselves and nearby human-driven vehicles.
To train these RL agents, we needed fast simulations that could replicate highway stop-and-go behavior. We used experimental data collected on Interstate 24 near Nashville, Tennessee, and built simulations where vehicles replayed highway trajectories to create unstable traffic conditions. The AVs learn to smooth out these waves based on local measurements of their speed and the speed of the vehicle in front.
Simulation replaying a highway trajectory that exhibits several stop-and-go waves.
We designed the AVs to operate using only basic sensor information about themselves and the vehicle in front. The observations consist of the AV’s speed, the speed of the leading vehicle, and the space gap between them. Based on these inputs, the RL agent prescribes either an instantaneous acceleration or a desired speed for the AV. This approach allows the controllers to be deployed on most modern vehicles in a decentralized manner without requiring additional infrastructure.
Reward design
The reward function is crucial as it guides the agents towards achieving our objectives:
- Wave smoothing: Reduce stop-and-go oscillations.
- Energy efficiency: Lower fuel consumption for all vehicles, not just AVs.
- Safety: Ensure reasonable following distances and avoid abrupt braking.
- Driving comfort: Avoid aggressive accelerations and decelerations.
- Adherence to human driving norms: Maintain a “normal” driving behavior that doesn’t make surrounding drivers uncomfortable.
We balanced these objectives by introducing dynamic minimum and maximum gap thresholds, which help ensure safe and reasonable behavior while optimizing fuel efficiency. We also penalized the fuel consumption of human-driven vehicles behind the AV to discourage selfish behaviors. This approach aims to strike a balance between energy savings and maintaining a safe driving environment.
Simulation results
![]()
Illustration of the dynamic minimum and maximum gap thresholds, within which the AV can operate freely to smooth traffic as efficiently as possible.
In simulation, this approach resulted in significant fuel savings of up to 20% across all road users in the most congested scenarios. The RL controllers were tested on standard consumer cars equipped with a smart adaptive cruise control (ACC), which performed well under these conditions. This modular control framework allowed for effective deployment without requiring additional infrastructure.
100 AV field test: deploying RL at scale


Our 100 cars were parked at our operational center during the experiment week. Before deploying RL controllers in the field, we trained and validated them extensively in simulation and on hardware.
The deployment steps involved:
- Training in data-driven simulations: We used highway traffic data from I-24 to create a training environment with realistic wave dynamics, then evaluated the agent’s performance in various new traffic scenarios.
- Deployment on hardware: After being validated in robotics software, the trained controller was uploaded onto the car and operated through its onboard cruise control.
- Modular control framework: One key challenge during the test was not having access to leading vehicle information sensors. To overcome this, we integrated the RL controller into a hierarchical system called the MegaController, which combined a speed planner that considered downstream traffic conditions with the RL controller as the final decision maker.
- Validation on hardware: The RL agents were tested under careful human supervision to ensure they could adapt to unpredictable behavior. This involved driving the RL-controlled vehicles on the road and making adjustments based on feedback from human operators.
Key Takeaways
- RL controllers can be deployed on standard consumer cars: The trained controllers are designed to operate in a decentralized manner using only basic sensor information, allowing them to be implemented on most modern vehicles.
- Significant fuel savings across all road users: In the most congested scenarios, RL controllers resulted in up to 20% fuel savings for all vehicles involved.
- Robust deployment framework: A modular control framework was developed that allowed us to validate and deploy RL controllers on hardware without requiring additional infrastructure.
“`
This HTML document represents the rewritten article with a structured layout, including headings, paragraphs, images, and a key takeaways section. It maintains all the key facts, figures, and names from the original while presenting them in a British English style suitable for an AI publication.
Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.




