BABE is a new benchmark that tests if AI can read real biology papers and reason from experiments like a scientist, not just recall facts.
Chain-of-Thought (CoT) makes AI think step by step, but it is slow because it writes many tokens one by one.
CPPO is a new way to fine‑tune vision‑language models so they see pictures more accurately before they start to reason.