This survey explains how to make AI agents not just smart, but also efficient with their time, memory, and tool use.
This paper turns rebuttal writing from ‘just write some text’ into ‘make a plan with proof, then write.’
RoboBrain 2.5 teaches robots to see depth precisely and to keep track of time-aware progress, so plans turn into safe, accurate actions.
Putting the reading passage before the question and answer choices (CQO) makes language models much more accurate than putting it after (QOC), by about 15 percentage points on average.
TwinBrainVLA is a robot brain with two halves: a frozen generalist that keeps world knowledge safe and a trainable specialist that learns to move precisely.
Numina-Lean-Agent is a new open system that uses a general coding agent to write and check exact math proofs in Lean without special training.
This survey turns model understanding into a step-by-step repair toolkit called Locate, Steer, and Improve.
FantasyVLN teaches a robot to follow language instructions while looking around, using a smart, step-by-step thinking style during training but not at test time.
AgentEHR is a new, realistic test that asks AI agents to read messy hospital records and make full clinical decisions, not just look up facts.
FutureOmni is the first benchmark that tests if multimodal AI models can predict what happens next from both sound and video, not just explain what already happened.
DARC teaches big language models to get smarter by splitting training into two calm, well-organized steps instead of one chaotic loop.
ChartVerse is a new way to make lots of tricky, realistic charts and perfectly checked questions so AI can learn to read charts better.