This paper builds a live challenge that tests how well Deep Research Agents (DRAs) can write expert-level Wikipedia-style articles.
This paper teaches talking avatars not just to speak, but to look around their scene and handle nearby objects exactly as a text instruction says.
PolySAE is a new kind of sparse autoencoder that keeps a simple, linear way to find features but uses a smarter decoder that can multiply features together.
LatentLens is a simple, training-free way to translate what a model "sees" in image patches into clear words and phrases.
PaperBanana is a team of AI helpers that turns a paper’s method text and caption into a clean, accurate, publication-ready figure.
Most reinforcement learning agents only get a simple pass/fail reward, which hides how good or bad their attempts really were.
This paper shows how to make a whole picture in one go, directly in pixels, without using a hidden “latent” space or many tiny steps.
This paper upgrades a small but mighty vision-language model called PaddleOCR-VL-1.5 to read and understand real-world, messy documents better than any model before it.
ConceptMoE teaches a language model to group easy, similar tokens into bigger ideas called concepts, so it spends more brainpower on the hard parts.
This paper shows that many reasoning failures in AI are caused by just a few distracting words in the prompt, not because the problems are too hard.
This paper introduces Foundation-Sec-8B-Reasoning, a small (8 billion parameter) AI model that is trained to “think out loud” before answering cybersecurity questions.
DeepSearchQA is a new test with 900 real-world style questions that checks if AI agents can find complete lists of answers, not just one fact.