RIVER Bench is a new test that checks how well AI can watch a video stream and talk with you in real time.
MemSifter is a smart helper that picks the right memories for a big AI so the big AI doesn’t have to read everything.
MemGUI-Bench is a new test that checks how well phone-controlling AI agents can remember important information both during a task and across different tries.
This paper argues that true world models are not just sprinkling facts into single tasks, but building a unified system that can see, think, remember, act, and generate across many situations.
This paper builds MemoryRewardBench, a big test that checks if reward models (AI judges) can fairly grade how other AIs manage long-term memory, not just whether their final answers are right.
RealMem is a new benchmark that tests how well AI assistants remember and manage long, ongoing projects across many conversations.
This paper introduces the Confucius Code Agent (CCA), a coding helper built to handle huge real-world codebases with long tasks and many tools.