Metric Anything is a new way to teach AI real, ruler-like distances (metric depth) from very mixed and noisy 3D data.
PLANING is a new way to build 3D worlds from a moving single camera by combining two kinds of pieces: sharp triangles for shape and soft Gaussians for looks.
CAR-bench is a new 'driving test' for AI assistants that checks if they can stay careful, honest, and consistent during real back-and-forth conversations in a car.
Robots used to copy actions from videos without truly understanding how the world changes, so they often messed up long, multi-step jobs.
This paper builds a safe science “playground” called DeR that fairly tests how AI finds facts (retrieval) and how it thinks with those facts (reasoning) without mixing them up.
MMFineReason is a huge, open dataset (1.8 million examples, 5.1 billion solution tokens) that teaches AIs to think step by step about pictures and text together.
Big language models are great at words but waste lots of time and energy when they try random actions in non-language games like Sudoku, Sokoban, 2048, FrozenLake, and Rubik’s Cube.
DreamActor-M2 is a new way to make a still picture move by copying motion from a video while keeping the character’s look the same.
OCRVerse is a new AI model that can read both plain text in documents and the visual structures in charts, webpages, and science plots, all in one system.
The paper shows how to make AI think faster and smarter by planning in a hidden space instead of writing long step-by-step sentences.
The paper shows a fast, training-free way to boost an LLM’s step-by-step reasoning by smartly reusing the model’s own probabilities.
Hyper-Connections (HC) make the usual single shortcut in neural networks wider by creating several parallel streams and letting the model mix them, but this can become unstable when stacked deep.