Long-Horizon Planning with World Models: GRASP Makes It Practical

By ✦ min read

Introduction

Learned world models have become remarkably powerful. They can predict long sequences of future observations in high-dimensional visual spaces and generalize across tasks in ways that seemed impossible just a few years ago. As these models scale, they transition from task-specific predictors into general-purpose simulators. However, possessing a powerful predictive model does not automatically grant the ability to use it effectively for control, learning, or planning. In practice, long-horizon planning with modern world models remains fragile: optimization turns ill-conditioned, non-greedy structures create problematic local minima, and high-dimensional latent spaces introduce subtle failure modes. This article introduces GRASP, a new gradient-based planner that addresses these challenges directly, making long-horizon planning robust and practical.

Long-Horizon Planning with World Models: GRASP Makes It Practical — Source: bair.berkeley.edu

The Challenge of Long-Horizon Planning

Planning over many time steps with learned dynamics models is a stress test. The path from an initial state to a desired goal is long, and the landscape of possible action sequences is fraught with difficulties. Let's examine the core issues.

Ill-Conditioned Optimization

When optimizing over action sequences, the gradients can become extremely small or large due to the recurrent nature of the model. This ill-conditioning makes it hard for standard optimization algorithms to make meaningful progress, especially as the horizon lengthens.

Non-Greedy Local Minima

World models often contain complex dependencies where an action taken early in the sequence has little effect until much later. This non-greedy structure creates many local minima that trap gradient-based planners, leading to suboptimal trajectories.

High-Dimensional Pitfalls

Modern world models operate in high-dimensional latent spaces, such as those learned by vision models. Computing gradients through these spaces can be brittle; the signal for actions gets diluted or distorted, especially when the model relies on intricate state-input dynamics.

GRASP: A New Gradient-Based Planner

GRASP (Gradient-based Recurrent Action Sequence Planner) tackles these issues head-on with three key innovations that work together to stabilize and accelerate long-horizon planning.

Virtual States for Parallel Optimization

Instead of treating the trajectory as a single sequence, GRASP lifts the planning problem into a set of virtual states that are optimized in parallel across time steps. This decouples the temporal dependencies during the optimization, allowing gradients to flow more freely and reducing the ill-conditioning that plagues sequential approaches.

Stochasticity for Exploration

To escape bad local minima, GRASP injects stochasticity directly into the state iterates. This random perturbation acts like a form of exploration, helping the planner discover promising regions of the action space that would otherwise be missed. The noise is carefully controlled to balance exploration and convergence.

Gradient Reshaping for Clean Signals

A central challenge is that gradients through high-dimensional vision models can be brittle, especially when they mix state and input gradients. GRASP reshapes the gradient flow so that actions receive clean, informative signals. By avoiding the fragile state-input gradients, the planner can make reliable updates even when using rich perceptual inputs.

The Team and Future Directions

GRASP is the result of collaborative work by Mike Rabbat, Aditi Krishnapriyan, Yann LeCun, and Amir Bar (with equal advisorship). The project opens up new possibilities for using large world models in control tasks that require foresight over many steps. Future work may extend GRASP to non-differentiable models, incorporate more structured exploration, or apply it to real-world robotic systems. As world models continue to evolve, planners like GRASP will be essential to unlock their full potential as general-purpose simulators.

Conclusion

Long-horizon planning remains a fundamental challenge for learned world models. GRASP addresses the core issues of ill-conditioned optimization, local minima, and brittle gradients through virtual state parallelization, stochastic exploration, and gradient reshaping. By making gradient-based planning robust over longer horizons, GRASP brings us closer to fully harnessing the power of modern world models for complex decision-making tasks.

Tags: