Solar Open is a giant bilingual AI (102 billion parameters) that focuses on helping underserved languages like Korean catch up with English-level AI quality.
X-Coder shows that models can learn expert-level competitive programming using data that is 100% synthetic—no real contest problems needed.
VideoDR is a new benchmark that tests if AI can watch a video, pull out key visual clues, search the open web, and chain the clues together to find one verifiable answer.
ET-Agent is a training framework that teaches AI agents to use tools (like search and code) more wisely, not just to get the right answer.
This paper introduces Laser, a new way for vision-language models to think in their hidden space before speaking, so they see the whole “forest” before picking out the “trees.”
This paper teaches AI models not just how to solve problems but also how to tell when their own answers might be wrong.
The paper shows that friendly, people-pleasing language can trick even advanced language models into agreeing with wrong answers.
BabyVision is a new test that checks if AI can handle the same basic picture puzzles that young children can do, without leaning on language tricks.
Real instructions often have logic like and first-then and if-else and this paper teaches models to notice and obey that logic.
This paper builds BizFinBench.v2, a big bilingual (Chinese–English) test that checks how well AI models really handle finance using real business data from China and the U.S.
The paper fixes a big problem in training web-searching AI: rewarding only the final answer makes agents cut corners and sometimes hallucinate.
This paper says long chain-of-thought (Long CoT) works best when it follows a 'molecular' pattern with three kinds of thinking bonds: Deep-Reasoning, Self-Reflection, and Self-Exploration.