Training big AI models uses lots of memory because most methods still keep a secret full-precision copy of the weights called master weights.
This paper introduces GANPO, a new way to train language models from human preferences by guiding the model using its hidden thoughts (latent space) instead of just its visible words (token space).
This paper shows a new way to help AI think through long problems faster by turning earlier text steps into small pictures the AI can reread.
The paper tackles a real problem: one-shot image or text searches often miss the right evidence (low hit-rate), especially in noisy, cluttered pictures.
Metric Anything is a new way to teach AI real, ruler-like distances (metric depth) from very mixed and noisy 3D data.
PLANING is a new way to build 3D worlds from a moving single camera by combining two kinds of pieces: sharp triangles for shape and soft Gaussians for looks.
CAR-bench is a new 'driving test' for AI assistants that checks if they can stay careful, honest, and consistent during real back-and-forth conversations in a car.
Robots used to copy actions from videos without truly understanding how the world changes, so they often messed up long, multi-step jobs.
This paper upgrades a small but mighty vision-language model called PaddleOCR-VL-1.5 to read and understand real-world, messy documents better than any model before it.
This paper builds a safe science “playground” called DeR that fairly tests how AI finds facts (retrieval) and how it thinks with those facts (reasoning) without mixing them up.
MMFineReason is a huge, open dataset (1.8 million examples, 5.1 billion solution tokens) that teaches AIs to think step by step about pictures and text together.
Big language models are great at words but waste lots of time and energy when they try random actions in non-language games like Sudoku, Sokoban, 2048, FrozenLake, and Rubik’s Cube.