Papers924

Patient-Similarity Cohort Reasoning in Clinical Text-to-SQL

Yifei Shen, Yilun Zhao et al.Jan 14arXiv

This paper introduces CLINSQL, a 633-task benchmark that turns real clinician-style questions into SQL challenges over the MIMIC-IV v3.1 hospital database.

#clinical text-to-SQL#EHR#MIMIC-IV

Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning

Intermediate

Chi-Pin Huang, Yunze Man et al.Jan 14arXiv

Fast-ThinkAct teaches a robot to plan with a few tiny hidden "thought tokens" instead of long paragraphs, making it much faster while staying smart.

#Vision-Language-Action#latent reasoning#verbalizable planning

Efficient Camera-Controlled Video Generation of Static Scenes via Sparse Diffusion and 3D Rendering

Intermediate

Jieying Chen, Jeffrey Hu et al.Jan 14arXiv

This paper shows how to make long, camera-controlled videos much faster by generating only a few smart keyframes with diffusion, then filling in the rest using a 3D scene and rendering.

#camera-controlled video generation#sparse keyframes#3D reconstruction

DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation

Intermediate

Yibo Wang, Lei Wang et al.Jan 14arXiv

The paper introduces DeepResearchEval, a fully automated way to build realistic deep research tasks and to grade long research reports from AI systems.

#deep research agents#agentic evaluation#persona-driven tasks

STEP3-VL-10B Technical Report

Beginner

Ailin Huang, Chengyuan Yao et al.Jan 14arXiv

STEP3-VL-10B is a small (10 billion parameters) open multimodal model that sees images and reads text, yet scores like much larger models.

#multimodal foundation model#unified pre-training#perception encoder

Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning

Intermediate

Zhiyuan Hu, Yunhai Hu et al.Jan 14arXiv

This paper introduces MATTRL, a way for multiple AI agents to learn from their own conversations at test time using short, reusable text notes instead of retraining their weights.

#multi-agent systems#test-time reinforcement learning#experience retrieval

PersonalAlign: Hierarchical Implicit Intent Alignment for Personalized GUI Agent with Long-Term User-Centric Records

Intermediate

Yibo Lyu, Gongwei Chen et al.Jan 14arXiv

The paper tackles a real-life problem: people often give phones short, vague instructions, so agents must guess the missing details using what they know about the user.

#personalized GUI agent#implicit intent#preference modeling

OpenVoxel: Training-Free Grouping and Captioning Voxels for Open-Vocabulary 3D Scene Understanding

Intermediate

Sheng-Yu Huang, Jaesung Choe et al.Jan 14arXiv

OpenVoxel is a training-free way to understand 3D scenes by grouping tiny 3D blocks (voxels) into objects and giving each object a clear caption.

#OpenVoxel#Sparse Voxel Rasterization#training-free 3D understanding

Omni-R1: Towards the Unified Generative Paradigm for Multimodal Reasoning

Intermediate

Dongjie Cheng, Yongqi Li et al.Jan 14arXiv

Omni-R1 teaches AI to think with pictures and words at the same time by drawing helpful mini-images while reasoning.

#multimodal reasoning#interleaved generation#functional image generation

V-DPM: 4D Video Reconstruction with Dynamic Point Maps

Intermediate

Edgar Sucar, Eldar Insafutdinov et al.Jan 14arXiv

V-DPM is a new way for AI to turn a short video into a moving 3D world, capturing both the shape and the motion of everything in it.

#Dynamic Point Maps#4D reconstruction#scene flow

EvoFSM: Controllable Self-Evolution for Deep Research with Finite State Machines

Intermediate

Shuo Zhang, Chaofa Yuan et al.Jan 14arXiv

EvoFSM is a way for AI agents to improve themselves safely by editing a clear flowchart (an FSM) instead of rewriting everything blindly.

#Finite State Machine#Structured Self-Evolution#Atomic Operations

MAXS: Meta-Adaptive Exploration with LLM Agents

Intermediate

Jian Zhang, Zhiyuan Wang et al.Jan 14arXiv

MAXS is a new way for AI agents to think a few steps ahead while using tools like search and code, so they make smarter choices.

#LLM agents#tool-augmented reasoning#lookahead

32 33 34 35 36