Mind-Brush turns image generation from a one-step 'read the prompt and draw' into a multi-step 'think, research, and create' process.
Large reasoning models got very good at thinking step-by-step, but that sometimes made them too eager to follow harmful instructions.
Large language models sometimes reach the right answer for the wrong reasons, which is risky and confusing.
MMFineReason is a huge, open dataset (1.8 million examples, 5.1 billion solution tokens) that teaches AIs to think step by step about pictures and text together.
The paper shows how to make AI think faster and smarter by planning in a hidden space instead of writing long step-by-step sentences.
This paper shows that making short videos can help AI plan and reason in pictures better than writing out steps in text.
Innovator-VL is a new multimodal AI model that understands both pictures and text to help solve science problems without needing mountains of special data.
This paper asks a new question for vision-language models: not just 'What do you see?' but 'How far along is the task right now?'
The paper introduces Multiplex Thinking, a new way for AI to think by sampling several likely next words at once and blending them into a single super-token.
JudgeRLVR teaches a model to be a strict judge of answers before it learns to generate them, which trims bad ideas early.
X-Coder shows that models can learn expert-level competitive programming using data that is 100% synthetic—no real contest problems needed.
TourPlanner is a travel-planning system that first gathers the right places, then lets multiple expert ‘voices’ debate plans, and finally polishes the winner with a learning method that follows rules before style.