Papers1055

OpenVoxel: Training-Free Grouping and Captioning Voxels for Open-Vocabulary 3D Scene Understanding

Sheng-Yu Huang, Jaesung Choe et al.Jan 14arXiv

OpenVoxel is a training-free way to understand 3D scenes by grouping tiny 3D blocks (voxels) into objects and giving each object a clear caption.

#OpenVoxel#Sparse Voxel Rasterization#training-free 3D understanding

Omni-R1: Towards the Unified Generative Paradigm for Multimodal Reasoning

Intermediate

Dongjie Cheng, Yongqi Li et al.Jan 14arXiv

Omni-R1 teaches AI to think with pictures and words at the same time by drawing helpful mini-images while reasoning.

#multimodal reasoning#interleaved generation#functional image generation

V-DPM: 4D Video Reconstruction with Dynamic Point Maps

Intermediate

Edgar Sucar, Eldar Insafutdinov et al.Jan 14arXiv

V-DPM is a new way for AI to turn a short video into a moving 3D world, capturing both the shape and the motion of everything in it.

#Dynamic Point Maps#4D reconstruction#scene flow

EvoFSM: Controllable Self-Evolution for Deep Research with Finite State Machines

Intermediate

Shuo Zhang, Chaofa Yuan et al.Jan 14arXiv

EvoFSM is a way for AI agents to improve themselves safely by editing a clear flowchart (an FSM) instead of rewriting everything blindly.

#Finite State Machine#Structured Self-Evolution#Atomic Operations

MAXS: Meta-Adaptive Exploration with LLM Agents

Intermediate

Jian Zhang, Zhiyuan Wang et al.Jan 14arXiv

MAXS is a new way for AI agents to think a few steps ahead while using tools like search and code, so they make smarter choices.

#LLM agents#tool-augmented reasoning#lookahead

Geometric Stability: The Missing Axis of Representations

Intermediate

Prashant C. RajuJan 14arXiv

Similarity tells you if two models seem to think about things the same way, but it doesn’t tell you if that thinking is sturdy when the world wiggles.

#geometric stability#representation similarity#CKA

World Craft: Agentic Framework to Create Visualizable Worlds via Text

Intermediate

Jianwen Sun, Yukang Feng et al.Jan 14arXiv

World Craft lets anyone turn a short text description into a playable, visual game world without coding.

#AI Town#multi-agent framework#layout generation

EvasionBench: A Large-Scale Benchmark for Detecting Managerial Evasion in Earnings Call Q&A

Intermediate

Shijian Ma, Yan Lin et al.Jan 14arXiv

EvasionBench is a new, very large dataset that helps computers spot when company leaders dodge questions during earnings call Q&A.

#evasion detection#earnings call Q&A#financial NLP

SkinFlow: Efficient Information Transmission for Open Dermatological Diagnosis via Dynamic Visual Encoding and Staged RL

Intermediate

Lijun Liu, Linwei Chen et al.Jan 14arXiv

SkinFlow is a 7B-parameter vision–language model that diagnoses skin conditions by sending the most useful visual information to the language brain, instead of just getting bigger.

#dermatology AI#vision-language model#Dynamic Visual Encoding

The AI Hippocampus: How Far are We From Human Memory?

Intermediate

Zixia Jia, Jiaqi Li et al.Jan 14arXiv

This survey asks how close AI memory systems are to human memory and organizes the answer into three parts: implicit memory (inside the model), explicit memory (outside storage you can look up), and agentic memory (what an AI agent keeps over time to plan and act).

#LLM memory#implicit memory#explicit memory

Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning

Intermediate

Shaotian Yan, Kaiyuan Liu et al.Jan 14arXiv

The paper introduces DASD-4B-Thinking, a small (4B) open-source reasoning model that scores like much larger models on hard math, science, and coding tests.

#sequence-level distillation#divergence-aware sampling#temperature-scheduled learning

OpenDecoder: Open Large Language Model Decoding to Incorporate Document Quality in RAG

Intermediate

Fengran Mo, Zhan Su et al.Jan 13arXiv

OpenDecoder teaches large language models (LLMs) to pay more attention to better documents during Retrieval-Augmented Generation (RAG).

#Retrieval-Augmented Generation#LLM Decoding#Attention Modulation

50 51 52 53 54