Traditional supervised fine-tuning (SFT) makes a model copy one answer too exactly, which can cause overfitting to the exact wording instead of the real idea.
Similarity tells you if two models seem to think about things the same way, but it doesn’t tell you if that thinking is sturdy when the world wiggles.
World Craft lets anyone turn a short text description into a playable, visual game world without coding.
EvasionBench is a new, very large dataset that helps computers spot when company leaders dodge questions during earnings call Q&A.
SkinFlow is a 7B-parameter vision–language model that diagnoses skin conditions by sending the most useful visual information to the language brain, instead of just getting bigger.
This survey asks how close AI memory systems are to human memory and organizes the answer into three parts: implicit memory (inside the model), explicit memory (outside storage you can look up), and agentic memory (what an AI agent keeps over time to plan and act).
The paper introduces DASD-4B-Thinking, a small (4B) open-source reasoning model that scores like much larger models on hard math, science, and coding tests.
OpenDecoder teaches large language models (LLMs) to pay more attention to better documents during Retrieval-Augmented Generation (RAG).
TranslateGemma is a family of open machine translation models fine-tuned from Gemma 3 to translate many languages more accurately.
The paper introduces Entropy Sentinel, a simple way to watch how accurate an AI is by reading its “uncertainty heartbeat” during generation.
Agents often act like tourists without a map: they react to what they see now and miss long-term consequences.
3AM is a new way to track and segment the same object across a whole video, even when the camera view changes a lot.