Papers1055

T3D: Few-Step Diffusion Language Models via Trajectory Self-Distillation with Direct Discriminative Optimization

Tunyu Zhang, Xinxi Zhang et al.Feb 12arXiv

This paper shows how to make diffusion language models write high‑quality text in just a few steps, which makes them much faster.

#diffusion language models#few-step decoding#trajectory self-distillation

ExStrucTiny: A Benchmark for Schema-Variable Structured Information Extraction from Document Images

Intermediate

Mathieu Sibue, Andres Muñoz Garza et al.Feb 12arXiv

ExStrucTiny is a new test (benchmark) that checks if AI can pull many connected facts from all kinds of documents and neatly put them into JSON, even when the question style and schema change.

#structured information extraction#document understanding#vision-language models

Query-focused and Memory-aware Reranker for Long Context Processing

Intermediate

Yuqing Li, Jiangnan Li et al.Feb 12arXiv

QRRanker is a lightweight way to sort many long text chunks by how helpful they are to a question, using the model’s own attention to score relevance.

#query-focused retrieval heads#attention-based reranking#listwise ranking

Sci-CoE: Co-evolving Scientific Reasoning LLMs via Geometric Consensus with Sparse Supervision

Intermediate

Xiaohan He, Shiyang Feng et al.Feb 12arXiv

Sci-CoE is a two-stage training method that helps one language model learn to both solve science problems and check those solutions with very little labeled data.

#scientific reasoning#co-evolution#solver-verifier

DreamID-Omni: Unified Framework for Controllable Human-Centric Audio-Video Generation

Intermediate

Xu Guo, Fulong Ye et al.Feb 12arXiv

DreamID-Omni is one model that can create, edit, and animate human-centered videos with matching voices, all in sync.

#audio-video generation#diffusion transformer#identity preservation

dVoting: Fast Voting for dLLMs

Intermediate

Sicheng Feng, Zigeng Chen et al.Feb 12arXiv

Diffusion Large Language Models (dLLMs) can write many parts of an answer at once, not just left to right like usual chatbots.

#diffusion large language models#remasking#test-time scaling

P-GenRM: Personalized Generative Reward Model with Test-time User-based Scaling

Intermediate

Pinyi Zhang, Ting-En Lin et al.Feb 12arXiv

This paper introduces P-GenRM, a personalized generative reward model that judges AI answers using a custom scorecard built just for each user and situation.

#personalized reward modeling#generative reward model#evaluation chain

The Pensieve Paradigm: Stateful Language Models Mastering Their Own Context

Intermediate

Xiaoyuan Liu, Tian Liang et al.Feb 12arXiv

This paper gives language models a 'wand' to manage their own memory, instead of relying on humans to stuff the prompt for them.

#Stateful language models#Pensieve paradigm#Context pruning

GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning

Intermediate

GigaBrain Team, Boyuan Wang et al.Feb 12arXiv

GigaBrain-0.5M* is a robot brain that sees, reads, and acts, and it gets smarter by imagining the future before moving.

#Vision-Language-Action#World Model#Reinforcement Learning

DeepSight: An All-in-One LM Safety Toolkit

Intermediate

Bo Zhang, Jiaxuan Guo et al.Feb 12arXiv

DeepSight is a free, all-in-one safety toolkit that both tests how models behave (DeepSafe) and peeks inside how they think (DeepScan).

#LLM safety evaluation#multimodal safety#frontier AI risks

LawThinker: A Deep Research Legal Agent in Dynamic Environments

Intermediate

Xinyu Yang, Chenlong Deng et al.Feb 12arXiv

LawThinker is a legal AI agent that double-checks every research step before using it, so small mistakes don’t snowball into big ones.

#Legal AI agent#Explore-Verify-Memorize#DeepVerifier

Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models

Intermediate

Xin Xu, Clive Bai et al.Feb 12arXiv

This paper shows a simple way to turn many 'too-easy' questions into harder, still-checkable ones so that AI keeps learning instead of stalling.

#Reinforcement Learning with Verifiable Rewards#Compositional prompts#Sequential Prompt Composition

14 15 16 17 18