Robots often act like goldfish with short memories; HiF-VLA fixes this by letting them use motion to remember the past and predict the future.
This paper shows that we can turn big, smart vision features into a small, easy-to-use code for image generation with just one attention layer.