Papers776

UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and Editing

Dianyi Wang, Chaofan Ma et al.Feb 2arXiv

UniReason is a single, unified model that plans with world knowledge before making an image and then edits its own result to fix mistakes, like a student drafting and revising an essay.

#unified multimodal model#world knowledge reasoning#text-to-image generation

SLIME: Stabilized Likelihood Implicit Margin Enforcement for Preference Optimization

Intermediate

Maksim Afanasyev, Illarion IovFeb 2arXiv

SLIME is a new way to train chatbots so they follow human preferences without forgetting how to write well.

#SLIME#preference optimization#DPO

Unified Personalized Reward Model for Vision Generation

Intermediate

Yibin Wang, Yuhang Zang et al.Feb 2arXiv

The paper introduces UnifiedReward-Flex, a reward model that judges images and videos the way a thoughtful human would—by flexibly changing what it checks based on the prompt and the visual evidence.

#personalized reward model#multimodal reward#context-adaptive reasoning

SWE-Universe: Scale Real-World Verifiable Environments to Millions

Intermediate

Mouxiang Chen, Lei Zhang et al.Feb 2arXiv

SWE-Universe is a factory-like system that turns real GitHub pull requests into safe, repeatable coding practice worlds with automatic checkers.

#SWE-Universe#software engineering agents#pull requests

Why Steering Works: Toward a Unified View of Language Model Parameter Dynamics

Intermediate

Ziwen Xu, Chenyan Wu et al.Feb 2arXiv

The paper shows that three popular ways to control language models—fine-tuning a few weights, LoRA, and activation steering—are actually the same kind of action: a dynamic weight update driven by a control knob.

#language model steering#dynamic weight updates#activation steering

Rethinking Generative Recommender Tokenizer: Recsys-Native Encoding and Semantic Quantization Beyond LLMs

Intermediate

Yu Liang, Zhongjin Zhang et al.Feb 2arXiv

This paper proposes ReSID, a new way to turn items into short token codes (Semantic IDs) that are much easier for a recommender to predict.

#Semantic IDs#Generative Recommendation#Representation Learning

Show, Don't Tell: Morphing Latent Reasoning into Image Generation

Intermediate

Harold Haodong Chen, Xinxiang Yin et al.Feb 2arXiv

LatentMorph teaches an image-making AI to quietly think in its head while it draws, instead of stopping to write out its thoughts in words.

#latent reasoning#text-to-image generation#autoregressive models

Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation

Intermediate

Hongzhou Zhu, Min Zhao et al.Feb 2arXiv

The paper fixes a hidden mistake many fast video generators were making when turning a "see-everything" model into a "see-past-only" model.

#autoregressive video diffusion#causal attention#ODE distillation

TIDE: Trajectory-based Diagnostic Evaluation of Test-Time Improvement in LLM Agents

Intermediate

Hang Yan, Xinyu Che et al.Feb 2arXiv

This paper studies how AI agents get better while they are working, not just whether they finish the job.

#Test-Time Improvement#LLM agents#trajectory analysis

Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models

Intermediate

Yu Zeng, Wenxuan Huang et al.Feb 2arXiv

The paper introduces VDR-Bench, a new test with 2,000 carefully built questions that truly require both seeing (images) and reading (web text) to find answers.

#multimodal large language model#visual question answering#vision deep research

D-CORE: Incentivizing Task Decomposition in Large Reasoning Models for Complex Tool Use

Intermediate

Bowen Xu, Shaoyu Wu et al.Feb 2arXiv

This paper fixes a common problem in reasoning AIs called Lazy Reasoning, where the model rambles instead of making a good plan.

#task decomposition#tool use#large reasoning models

LoopViT: Scaling Visual ARC with Looped Transformers

Intermediate

Wen-Jie Shu, Xuerui Qiu et al.Feb 2arXiv

Loop-ViT is a vision model that thinks in loops, so it can take more steps on hard puzzles and stop early on easy ones.

#ARC-AGI#visual reasoning#Looped Transformer

5 6 7 8 9