SwimBird is a multimodal AI that can switch how it thinks: only in text, only in vision (with hidden picture-like thoughts), or a mix of both.
BABE is a new benchmark that tests if AI can read real biology papers and reason from experiments like a scientist, not just recall facts.
This paper teaches AI to pay attention better by training its focus, not just its words.
The paper shows how to train a language model with special extra hints (privileged information) during practice so it can still do well later without any hints.
This paper builds ID-MoCQA, a new two-step (multi-hop) quiz set about Indonesian culture that makes AI connect clues before answering.
Chain-of-Thought (CoT) makes AI think step by step, but it is slow because it writes many tokens one by one.
Render-of-Thought (RoT) turns the model’s step-by-step thinking from long text into slim images so the model can think faster with fewer tokens.
Reinforcement learning (RL) for large language models is slow because the rollout (text generation) stage can take more than 70% of training time, especially for long, step-by-step answers.
FantasyVLN teaches a robot to follow language instructions while looking around, using a smart, step-by-step thinking style during training but not at test time.
Most text-to-image models act like word-to-pixel copy machines and miss the hidden meaning in our prompts.
Large language models usually get only a final thumbs-up or thumbs-down at the end of their answer, which is too late to fix mistakes made in the middle.
This paper introduces CLINSQL, a 633-task benchmark that turns real clinician-style questions into SQL challenges over the MIMIC-IV v3.1 hospital database.