🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers9

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#temporal reasoning

Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding

Intermediate
Christopher Clark, Jieyu Zhang et al.Jan 15arXiv

Molmo2 is a family of vision-language models that can watch videos, understand them, and point to or track things over time using fully open weights, data, and code.

#vision-language model#video grounding#pointing and tracking

Patient-Similarity Cohort Reasoning in Clinical Text-to-SQL

Intermediate
Yifei Shen, Yilun Zhao et al.Jan 14arXiv

This paper introduces CLINSQL, a 633-task benchmark that turns real clinician-style questions into SQL challenges over the MIMIC-IV v3.1 hospital database.

#clinical text-to-SQL#EHR#MIMIC-IV

RealMem: Benchmarking LLMs in Real-World Memory-Driven Interaction

Beginner
Haonan Bian, Zhiyuan Yao et al.Jan 11arXiv

RealMem is a new benchmark that tests how well AI assistants remember and manage long, ongoing projects across many conversations.

#RealMem#long-term memory#project-oriented interactions

KnowMe-Bench: Benchmarking Person Understanding for Lifelong Digital Companions

Intermediate
Tingyu Wu, Zhisheng Chen et al.Jan 8arXiv

KnowMe-Bench is a new test that checks if AI helpers truly understand a person, not just remember facts.

#person understanding#autobiographical narratives#cognitive stream

Memory Matters More: Event-Centric Memory as a Logic Map for Agent Searching and Reasoning

Intermediate
Yuyang Hu, Jiongnan Liu et al.Jan 8arXiv

This paper turns an AI agent’s memory from a flat list of notes into a logic map of events connected by cause-and-time links.

#event-centric memory#Event Graph#logic-aware retrieval

Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-session Agents

Intermediate
Yiming Du, Baojun Wang et al.Dec 23arXiv

Memory-T1 teaches chatty AI agents to keep track of when things happened across many conversations.

#temporal reasoning#multi-session dialogue#reinforcement learning

How Much 3D Do Video Foundation Models Encode?

Intermediate
Zixuan Huang, Xiang Li et al.Dec 23arXiv

This paper asks a simple question: do video AI models trained only on 2D videos secretly learn about 3D worlds?

#video foundation models#3D awareness#temporal reasoning

HiF-VLA: Hindsight, Insight and Foresight through Motion Representation for Vision-Language-Action Models

Intermediate
Minghui Lin, Pengxiang Ding et al.Dec 10arXiv

Robots often act like goldfish with short memories; HiF-VLA fixes this by letting them use motion to remember the past and predict the future.

#Vision-Language-Action#motion vectors#temporal reasoning

Unified Video Editing with Temporal Reasoner

Intermediate
Xiangpeng Yang, Ji Xie et al.Dec 8arXiv

VideoCoF is a new way to edit videos that first figures out WHERE to edit and then does the edit, like thinking before acting.

#video editing#diffusion transformer#chain-of-frames