SAGE is a new test for how well AI research agents find scientific papers when questions require multi-step reasoning.
AACR-Bench is a new test set that checks how well AI can do code reviews using the whole project, not just one file.
This paper teaches a language-model agent to look up facts in millions of scientific paper summaries and answer clear, single-answer questions.
SimpleMem is a new memory system that helps AI remember long conversations without wasting space or tokens.