AgentLongBench is a new test that checks how well AI agents think over very long stories made of their own actions and the world's replies, not just by reading static documents.
This paper teaches a language-model agent to look up facts in millions of scientific paper summaries and answer clear, single-answer questions.
SAGE is a two-agent system that automatically writes tough, multi-step search questions and checks them by actually trying to solve them.
Typhoon-S is a simple, open recipe that turns a basic language model into a helpful assistant and then teaches it important local skills, all on small budgets.
DRPG is a four-step AI helper that writes strong academic rebuttals by first breaking a review into parts, then fetching evidence, planning a strategy, and finally writing the response.
This paper says modern video generators are starting to act like tiny "world simulators," not just pretty video painters.
Academic rebuttals are not just about being polite; they are about smart, strategic persuasion under hidden information.
This survey explains how to make AI agents not just smart, but also efficient with their time, memory, and tool use.
The paper introduces M^4olGen, a two-stage system that designs new molecules to match exact numbers for several properties (like QED, LogP, MW, HOMO, LUMO) at the same time.
This paper introduces MATTRL, a way for multiple AI agents to learn from their own conversations at test time using short, reusable text notes instead of retraining their weights.
This survey asks how close AI memory systems are to human memory and organizes the answer into three parts: implicit memory (inside the model), explicit memory (outside storage you can look up), and agentic memory (what an AI agent keeps over time to plan and act).
The paper shows that when we give AI lots of extra text, even harmless extra text, it can get badly confused—sometimes losing up to 80% of its accuracy.