QuCo-RAG is a new way to decide when an AI should look things up while it writes, using facts from its training data instead of its own shaky confidence.
DramaBench is a new test that checks how well AI continues drama scripts across six separate skills instead of one big score.
This paper asks a simple question with big impact: Can AI tell which test questions are hard for humans?
This paper asks if large language models (LLMs) can act like "world models" that predict what happens next in text-based environments, not just the next word in a sentence.
MemEvolve teaches AI agents not only to remember past experiences but also to improve the way they remember, like a student who upgrades their study habits over time.
This paper builds a tough new test called O3-BENCH to check if AI can truly think with images, not just spot objects.
Capitalization tie-out checks if a company’s ownership table truly matches what its legal documents say.
SWE-EVO is a new test (benchmark) that checks if AI coding agents can upgrade real software projects over many steps, not just fix one small bug.
MatSpray turns 2D guesses about what materials look like (color, shininess, metal) into a clean 3D model you can relight realistically.
Flow Matching is like teaching arrows to push points from a simple cloud (source) to real pictures (target); most people start from a Gaussian cloud because it points equally in all directions.
SAM Audio is a new AI that can pull out exactly the sound you want from a noisy mix using text, clicks on a video, and time ranges—together or separately.
This paper shows that great image understanding features alone are not enough for making great images; you also need strong pixel-level detail.