Understanding and Improving Hyperbolic Deep Reinforcement Learning

Timo Klein; Thomas Lang; Andrii Shkabrii; Alexander Sturm; Kevin Sidak; Lukas Miklautz; Claudia Plant; Yllka Velaj; Sebastian Tschiatschek

Understanding and Improving Hyperbolic Deep Reinforcement Learning

Intermediate

Timo Klein, Thomas Lang, Andrii Shkabrii et al.12/16/2025

arXiv PDF

Key Summary

•Reinforcement learning agents often see the world in straight, flat space (Euclidean), but many decision problems look more like branching trees that fit curved, hyperbolic space better.
•Past hyperbolic RL agents kept crashing during training because small math parts exploded or vanished, especially near the edges of the hyperbolic space and under PPO’s changing data.
•The paper analyzes where gradients blow up in both the Poincaré Ball and the Hyperboloid models and shows that large feature norms are the main culprit.
•HYPER++ fixes this with three pieces: RMSNorm to keep features calm, a learned feature-scaling gate to safely use more of the space, and a categorical value loss that matches the geometry.
•Using the Hyperboloid model avoids the Poincaré Ball’s conformal-factor headaches and leads to steadier learning.
•On ProcGen with PPO, HYPER++ learns more stably, scores higher, and trains about 30% faster in wall-clock time than prior hyperbolic agents.
•On Atari-5 with Double DQN, HYPER++ also beats strong Euclidean and hyperbolic baselines, showing the idea generalizes beyond PPO.
•Ablations show all three parts matter: removing RMSNorm or scaling leads to failures, and swapping the categorical loss for MSE usually hurts in the hyperbolic setting.
•The work focuses on optimization stability, not yet on which tasks benefit most from hyperbolic geometry or how representations look inside.
•The authors release code, aiming to make hyperbolic deep RL more practical and reproducible.

Why This Research Matters

Smarter, steadier learning lets game agents master new levels faster and generalize better, which is key for real-world tools that face fresh situations every day. By matching the space to the problem’s shape (trees and hierarchies), we can reduce wasted effort and improve reliability. Stabilizing training means fewer crashes and reruns, saving time and compute for labs and companies. The same ideas extend beyond games: robots planning sequences of actions, recommender systems understanding branching user journeys, and assistants making step-by-step choices can all benefit. With released code and a simple recipe, more teams can practically adopt hyperbolic deep RL. Over time, this could lead to AI that is both more efficient and more robust under change.

Detailed Explanation

Tap terms for definitions

01Background & Problem Definition

🍞 Top Bread (Hook): You know how family trees keep splitting into branches as you go down the generations? The farther you look, the more people there are. That shape isn’t flat like a sheet of paper; it spreads out fast like a growing web.