Papers11

#temporal reasoning

MMR-Life: Piecing Together Real-life Scenes for Multimodal Multi-image Reasoning

Jiachun Li, Shaoping Huang et al.Mar 2arXiv

MMR-Life is a new test (benchmark) that checks how AI understands everyday situations using several real photos at once.

#multimodal reasoning#multi-image understanding#real-life benchmark

Not triaged yet

Gaia2: Benchmarking LLM Agents on Dynamic and Asynchronous Environments

Intermediate

Romain Froger, Pierre Andrews et al.Feb 12arXiv

Gaia2 is a new test that measures how well AI agents handle real-life messiness like changing events, deadlines, and team coordination.

#Gaia2#ARE platform#asynchronous environments

Not triaged yet

Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding

Intermediate

Christopher Clark, Jieyu Zhang et al.Jan 15arXiv

Molmo2 is a family of vision-language models that can watch videos, understand them, and point to or track things over time using fully open weights, data, and code.

#vision-language model#video grounding#pointing and tracking

Not triaged yet

Patient-Similarity Cohort Reasoning in Clinical Text-to-SQL

Intermediate

Yifei Shen, Yilun Zhao et al.Jan 14arXiv

This paper introduces CLINSQL, a 633-task benchmark that turns real clinician-style questions into SQL challenges over the MIMIC-IV v3.1 hospital database.

#clinical text-to-SQL#EHR#MIMIC-IV

Not triaged yet

RealMem: Benchmarking LLMs in Real-World Memory-Driven Interaction

Beginner

Haonan Bian, Zhiyuan Yao et al.Jan 11arXiv

RealMem is a new benchmark that tests how well AI assistants remember and manage long, ongoing projects across many conversations.

#RealMem#long-term memory#project-oriented interactions

Not triaged yet

KnowMe-Bench: Benchmarking Person Understanding for Lifelong Digital Companions

Intermediate

Tingyu Wu, Zhisheng Chen et al.Jan 8arXiv

KnowMe-Bench is a new test that checks if AI helpers truly understand a person, not just remember facts.

#person understanding#autobiographical narratives#cognitive stream

Not triaged yet

Memory Matters More: Event-Centric Memory as a Logic Map for Agent Searching and Reasoning

Intermediate

Yuyang Hu, Jiongnan Liu et al.Jan 8arXiv

This paper turns an AI agent’s memory from a flat list of notes into a logic map of events connected by cause-and-time links.

#event-centric memory#Event Graph#logic-aware retrieval

Not triaged yet

Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-session Agents

Intermediate

Yiming Du, Baojun Wang et al.Dec 23arXiv

Memory-T1 teaches chatty AI agents to keep track of when things happened across many conversations.

#temporal reasoning#multi-session dialogue#reinforcement learning

Not triaged yet

How Much 3D Do Video Foundation Models Encode?

Intermediate

Zixuan Huang, Xiang Li et al.Dec 23arXiv

This paper asks a simple question: do video AI models trained only on 2D videos secretly learn about 3D worlds?

#video foundation models#3D awareness#temporal reasoning

Not triaged yet

HiF-VLA: Hindsight, Insight and Foresight through Motion Representation for Vision-Language-Action Models

Intermediate

Minghui Lin, Pengxiang Ding et al.Dec 10arXiv

Robots often act like goldfish with short memories; HiF-VLA fixes this by letting them use motion to remember the past and predict the future.

#Vision-Language-Action#motion vectors#temporal reasoning

Not triaged yet

Unified Video Editing with Temporal Reasoner

Intermediate

Xiangpeng Yang, Ji Xie et al.Dec 8arXiv

VideoCoF is a new way to edit videos that first figures out WHERE to edit and then does the edit, like thinking before acting.

#video editing#diffusion transformer#chain-of-frames

Not triaged yet