Time-series data are numbers tracked over time, like temperature each hour or traffic each day, and turning them into clear words usually needs experts.
LLM judges are cheap but biased; without calibration they can completely flip which model looks best.
Fast-FoundationStereo is a stereo vision system that sees depth from two cameras in real time while still working well on brand‑new scenes it was never trained on.
StereoSpace turns a single photo into a full 3D-style stereo pair without ever estimating a depth map.
Omni-Attribute is a new image encoder that learns just the parts of a picture you ask for (like hairstyle or lighting) and ignores the rest.
Normalizing Flows are models that learn how to turn real images into simple noise and then back again.
This paper asks whether reinforcement learning (RL) can improve making 3D models from text and shows that the answer is yes if we design the training and rewards carefully.
This paper shows that we can remove normalization layers from Transformers and still train them well by using a simple point‑by‑point function called Derf.
FoundationMotion is a fully automatic pipeline that turns raw videos into detailed motion data, captions, and quizzes about how things move.
DuetSVG is a new AI that learns to make SVG graphics by generating an image and the matching SVG code together, like sketching first and then tracing neatly.
MoCapAnything is a system that turns a single regular video into a 3D animation that can drive any rigged character, not just humans or one animal type.
The paper defines Microscopic Spatial Intelligence (MiSI) as the skill AI needs to understand tiny 3D things like molecules from 2D pictures and text, just like scientists do.