DARE is a new way for AI assistants to find the right R functions by also looking at what the data looks like, not just the words in the question.
Legal RAG Bench is a new, end-to-end test that checks how well legal AI systems find information and use it to answer tough, real-world legal questions.
SciDER is a team of smart AI helpers that can run almost the whole research process: think of ideas, read raw data, write and run code, and improve itself with feedback.
NanoKnow is a new benchmark that checks whether a language model’s answers come from what it saw during training or from extra text we give it at question time.
Panini is a way for AI to keep learning new facts without changing its brain by storing them as tiny linked Q&A facts in an external memory.
This paper teaches small, local AI models to write deep, insightful research reports by letting writing and planning work together instead of staying separate.
This paper builds a Google-for-theorems: a semantic search engine that finds exact theorems, lemmas, and propositions instead of just entire papers.
This paper teaches AI to look things up on the web and fix its own mistakes mid-thought instead of starting over from scratch.
MemSkill turns memory operations for AI agents into learnable skills instead of fixed, hand-made rules.
Mind-Brush turns image generation from a one-step 'read the prompt and draw' into a multi-step 'think, research, and create' process.
PaperBanana is a team of AI helpers that turns a paper’s method text and caption into a clean, accurate, publication-ready figure.
This paper builds a safe science “playground” called DeR that fairly tests how AI finds facts (retrieval) and how it thinks with those facts (reasoning) without mixing them up.