The paper asks a simple question: if a language model becomes better at step-by-step reasoning (using RLVR), do its text embeddings also get better? The short answer is no.
This paper introduces CGPT, a way to help computers find the right tables by building smarter mini-versions of tables and training with tough practice questions.
Action100M is a gigantic video dataset with about 100 million labeled action moments built automatically from 1.2 million instructional videos.
CPPO is a new way to fine‑tune vision‑language models so they see pictures more accurately before they start to reason.
Most image-similarity tools only notice how things look (color, shape, class) and miss deeper, human-like connections.