🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers925

AllBeginnerIntermediateAdvanced
All SourcesarXiv

Transition Matching Distillation for Fast Video Generation

Intermediate
Weili Nie, Julius Berner et al.Jan 14arXiv

Big video makers (diffusion models) create great videos but are too slow because they use hundreds of tiny clean-up steps.

#video diffusion#distillation#transition matching

Patient-Similarity Cohort Reasoning in Clinical Text-to-SQL

Intermediate
Yifei Shen, Yilun Zhao et al.Jan 14arXiv

This paper introduces CLINSQL, a 633-task benchmark that turns real clinician-style questions into SQL challenges over the MIMIC-IV v3.1 hospital database.

#clinical text-to-SQL#EHR#MIMIC-IV

Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning

Intermediate
Chi-Pin Huang, Yunze Man et al.Jan 14arXiv

Fast-ThinkAct teaches a robot to plan with a few tiny hidden "thought tokens" instead of long paragraphs, making it much faster while staying smart.

#Vision-Language-Action#latent reasoning#verbalizable planning

Efficient Camera-Controlled Video Generation of Static Scenes via Sparse Diffusion and 3D Rendering

Intermediate
Jieying Chen, Jeffrey Hu et al.Jan 14arXiv

This paper shows how to make long, camera-controlled videos much faster by generating only a few smart keyframes with diffusion, then filling in the rest using a 3D scene and rendering.

#camera-controlled video generation#sparse keyframes#3D reconstruction

DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation

Intermediate
Yibo Wang, Lei Wang et al.Jan 14arXiv

The paper introduces DeepResearchEval, a fully automated way to build realistic deep research tasks and to grade long research reports from AI systems.

#deep research agents#agentic evaluation#persona-driven tasks

STEP3-VL-10B Technical Report

Beginner
Ailin Huang, Chengyuan Yao et al.Jan 14arXiv

STEP3-VL-10B is a small (10 billion parameters) open multimodal model that sees images and reads text, yet scores like much larger models.

#multimodal foundation model#unified pre-training#perception encoder

Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning

Intermediate
Zhiyuan Hu, Yunhai Hu et al.Jan 14arXiv

This paper introduces MATTRL, a way for multiple AI agents to learn from their own conversations at test time using short, reusable text notes instead of retraining their weights.

#multi-agent systems#test-time reinforcement learning#experience retrieval

PersonalAlign: Hierarchical Implicit Intent Alignment for Personalized GUI Agent with Long-Term User-Centric Records

Intermediate
Yibo Lyu, Gongwei Chen et al.Jan 14arXiv

The paper tackles a real-life problem: people often give phones short, vague instructions, so agents must guess the missing details using what they know about the user.

#personalized GUI agent#implicit intent#preference modeling

OpenVoxel: Training-Free Grouping and Captioning Voxels for Open-Vocabulary 3D Scene Understanding

Intermediate
Sheng-Yu Huang, Jaesung Choe et al.Jan 14arXiv

OpenVoxel is a training-free way to understand 3D scenes by grouping tiny 3D blocks (voxels) into objects and giving each object a clear caption.

#OpenVoxel#Sparse Voxel Rasterization#training-free 3D understanding

Omni-R1: Towards the Unified Generative Paradigm for Multimodal Reasoning

Intermediate
Dongjie Cheng, Yongqi Li et al.Jan 14arXiv

Omni-R1 teaches AI to think with pictures and words at the same time by drawing helpful mini-images while reasoning.

#multimodal reasoning#interleaved generation#functional image generation

V-DPM: 4D Video Reconstruction with Dynamic Point Maps

Intermediate
Edgar Sucar, Eldar Insafutdinov et al.Jan 14arXiv

V-DPM is a new way for AI to turn a short video into a moving 3D world, capturing both the shape and the motion of everything in it.

#Dynamic Point Maps#4D reconstruction#scene flow

EvoFSM: Controllable Self-Evolution for Deep Research with Finite State Machines

Intermediate
Shuo Zhang, Chaofa Yuan et al.Jan 14arXiv

EvoFSM is a way for AI agents to improve themselves safely by editing a clear flowchart (an FSM) instead of rewriting everything blindly.

#Finite State Machine#Structured Self-Evolution#Atomic Operations
3233343536