Papers1055

Rethinking Generative Recommender Tokenizer: Recsys-Native Encoding and Semantic Quantization Beyond LLMs

Yu Liang, Zhongjin Zhang et al.Feb 2arXiv

This paper proposes ReSID, a new way to turn items into short token codes (Semantic IDs) that are much easier for a recommender to predict.

#Semantic IDs#Generative Recommendation#Representation Learning

Show, Don't Tell: Morphing Latent Reasoning into Image Generation

Intermediate

Harold Haodong Chen, Xinxiang Yin et al.Feb 2arXiv

LatentMorph teaches an image-making AI to quietly think in its head while it draws, instead of stopping to write out its thoughts in words.

#latent reasoning#text-to-image generation#autoregressive models

Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation

Intermediate

Hongzhou Zhu, Min Zhao et al.Feb 2arXiv

The paper fixes a hidden mistake many fast video generators were making when turning a "see-everything" model into a "see-past-only" model.

#autoregressive video diffusion#causal attention#ODE distillation

TIDE: Trajectory-based Diagnostic Evaluation of Test-Time Improvement in LLM Agents

Intermediate

Hang Yan, Xinyu Che et al.Feb 2arXiv

This paper studies how AI agents get better while they are working, not just whether they finish the job.

#Test-Time Improvement#LLM agents#trajectory analysis

ECHO-2: A Large-Scale Distributed Rollout Framework for Cost-Efficient Reinforcement Learning

Intermediate

Jie Xiao, Meng Chen et al.Feb 2arXiv

ECHO-2 is a new way to train AI with reinforcement learning that keeps a small, central trainer busy while sending the easy, cheap work (rollouts) to many low-cost computers spread around the world.

#ECHO-2#distributed rollouts#bounded staleness

Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models

Intermediate

Yu Zeng, Wenxuan Huang et al.Feb 2arXiv

The paper introduces VDR-Bench, a new test with 2,000 carefully built questions that truly require both seeing (images) and reading (web text) to find answers.

#multimodal large language model#visual question answering#vision deep research

D-CORE: Incentivizing Task Decomposition in Large Reasoning Models for Complex Tool Use

Intermediate

Bowen Xu, Shaoyu Wu et al.Feb 2arXiv

This paper fixes a common problem in reasoning AIs called Lazy Reasoning, where the model rambles instead of making a good plan.

#task decomposition#tool use#large reasoning models

LoopViT: Scaling Visual ARC with Looped Transformers

Intermediate

Wen-Jie Shu, Xuerui Qiu et al.Feb 2arXiv

Loop-ViT is a vision model that thinks in loops, so it can take more steps on hard puzzles and stop early on easy ones.

#ARC-AGI#visual reasoning#Looped Transformer

Quantifying the Gap between Understanding and Generation within Unified Multimodal Models

Intermediate

Chenlong Wang, Yuhang Chen et al.Feb 2arXiv

This paper shows that many AI models that both read images and write images are not truly unified inside—they often understand well but fail to generate (or the other way around).

#Unified Multimodal Models#GAPEVAL#Gap Score

Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models

Intermediate

Xiaomin Yu, Yi Xin et al.Feb 2arXiv

This paper finds a precise way to describe and fix the Modality Gap, which is when image and text features that mean the same thing still sit in different places in the AI’s memory space.

#Modality Gap#Multimodal Large Language Models#Contrastive Learning

An Empirical Study of World Model Quantization

Intermediate

Zhongqian Fu, Tianyi Zhao et al.Feb 2arXiv

World models are AI tools that imagine the future so a robot can plan what to do next, but they are expensive to run many times in a row.

#world models#post-training quantization#DINO-WM

No Global Plan in Chain-of-Thought: Uncover the Latent Planning Horizon of LLMs

Intermediate

Liyan Xu, Mo Yu et al.Feb 2arXiv

Large language models don’t map out a full step-by-step plan before they start thinking; they mostly plan just a little bit ahead.

#chain-of-thought#latent planning horizon#Tele-Lens

28 29 30 31 32