RadarGen is a tool that learns to generate realistic car radar point clouds just from multiple camera views.
This paper introduces NEPA, a very simple way to teach vision models by having them predict the next patch’s embedding in an image sequence, just like language models predict the next word.
Pixels are the raw stuff of images, and this paper shows you can learn great vision skills by predicting pixels directly, not by comparing fancy hidden features.
LitePT is a new AI backbone for 3D point clouds that uses convolutions in early layers and attention in later layers to be both fast and accurate.
COOPER is a single AI model that both “looks better” (perceives depth and object boundaries) and “thinks smarter” (reasons step by step) to answer spatial questions about images.