This paper shows how to safely make a neural network wider in the middle of training without it freaking out.
Loop-ViT is a vision model that thinks in loops, so it can take more steps on hard puzzles and stop early on easy ones.
The paper shows that big language models often get stuck with weight sizes set by training hyperparameters instead of by the data, which quietly hurts performance.
Reinforcement learning agents often see the world in straight, flat space (Euclidean), but many decision problems look more like branching trees that fit curved, hyperbolic space better.
This paper shows that we can remove normalization layers from Transformers and still train them well by using a simple point‑by‑point function called Derf.