Molmo2 is a family of vision-language models that can watch videos, understand them, and point to or track things over time using fully open weights, data, and code.
This paper introduces CLINSQL, a 633-task benchmark that turns real clinician-style questions into SQL challenges over the MIMIC-IV v3.1 hospital database.
RealMem is a new benchmark that tests how well AI assistants remember and manage long, ongoing projects across many conversations.
KnowMe-Bench is a new test that checks if AI helpers truly understand a person, not just remember facts.
This paper turns an AI agent’s memory from a flat list of notes into a logic map of events connected by cause-and-time links.
Memory-T1 teaches chatty AI agents to keep track of when things happened across many conversations.
This paper asks a simple question: do video AI models trained only on 2D videos secretly learn about 3D worlds?
Robots often act like goldfish with short memories; HiF-VLA fixes this by letting them use motion to remember the past and predict the future.
VideoCoF is a new way to edit videos that first figures out WHERE to edit and then does the edit, like thinking before acting.