Papers776

All Beginner Intermediate Advanced

All Sources arXiv

Scaling Multiagent Systems with Process Rewards

Intermediate

Ed Li, Junyu Ren et al.Jan 30arXiv

This paper teaches AI teams to get better by scoring every move they make, not just the final answer.

#multiagent reinforcement learning#process rewards#AI feedback

Deep Search with Hierarchical Meta-Cognitive Monitoring Inspired by Cognitive Neuroscience

Intermediate

Zhongxiang Sun, Qipeng Wang et al.Jan 30arXiv

Deep search agents can plan and browse the web in many steps, but they often fail because they don’t notice when their own thinking drifts off-track.

#deep search agents#metacognition#consistency monitoring

ReGuLaR: Variational Latent Reasoning Guided by Rendered Chain-of-Thought

Intermediate

Fanmeng Wang, Haotian Liu et al.Jan 30arXiv

Chain-of-Thought (CoT) makes AI think step by step, but it is slow because it writes many tokens one by one.

#Chain-of-Thought#Latent Reasoning#Variational Auto-Encoder

FourierSampler: Unlocking Non-Autoregressive Potential in Diffusion Language Models via Frequency-Guided Generation

Intermediate

Siyang He, Qiqi Wang et al.Jan 30arXiv

Diffusion language models (dLLMs) can write text in any order, but common decoding methods still prefer left-to-right, which wastes their superpower.

#diffusion language models#non-autoregressive generation#frequency-domain analysis

DIFFA-2: A Practical Diffusion Large Language Model for General Audio Understanding

Intermediate

Jiaming Zhou, Xuxin Cheng et al.Jan 30arXiv

DIFFA-2 is a new audio AI that listens to speech, sounds, and music and answers questions about them using a diffusion-style language model instead of the usual step-by-step (autoregressive) method.

#Diffusion language models#Audio understanding#Large audio language model

THINKSAFE: Self-Generated Safety Alignment for Reasoning Models

Intermediate

Seanie Lee, Sangwoo Park et al.Jan 30arXiv

Large reasoning models got very good at thinking step-by-step, but that sometimes made them too eager to follow harmful instructions.

#THINKSAFE#self-generated safety alignment#refusal steering

Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text

Intermediate

Ximing Lu, David Acuna et al.Jan 30arXiv

Golden Goose turns messy internet text into clean multiple-choice puzzles that computers can learn from and get automatic rewards for.

#Reinforcement Learning with Verifiable Rewards#Golden Goose#GooseReason-0.7M

Residual Context Diffusion Language Models

Intermediate

Yuezhou Hu, Harman Singh et al.Jan 30arXiv

Diffusion language models (dLLMs) generate several tokens at once but usually throw away lots of helpful clues each step—RCD keeps and reuses those clues.

#diffusion language models#residual context diffusion#soft tokens

DINO-SAE: DINO Spherical Autoencoder for High-Fidelity Image Reconstruction and Generation

Intermediate

Hun Chang, Byunghee Cha et al.Jan 30arXiv

DINO-SAE is a new autoencoder that keeps both the meaning of an image (semantics) and tiny textures (fine details) at the same time.

#DINO-SAE#spherical manifold#cosine similarity alignment

MEnvAgent: Scalable Polyglot Environment Construction for Verifiable Software Engineering

Intermediate

Chuanzhe Guo, Jingjing Wu et al.Jan 30arXiv

This paper builds a smart team of AI helpers, called MEnvAgent, that automatically sets up the right computer environments for code projects in many languages.

#environment construction#software engineering agents#Fail-to-Pass (F2P)

BatCoder: Self-Supervised Bidirectional Code-Documentation Learning via Back-Translation

Intermediate

Jingwen Xu, Yiyang Lu et al.Jan 30arXiv

BatCoder teaches a code model to write both code and its documentation by doing a round trip: from code to docs and back to code.

#back-translation#self-supervised learning#reinforcement learning for code

NativeTok: Native Visual Tokenization for Improved Image Generation

Intermediate

Bin Wu, Mengqi Huang et al.Jan 30arXiv

This paper fixes a hidden mismatch in image generation: tokenizers make tokens without order, but generators need an order to predict the next token well.

#visual tokenization#autoregressive image generation#causal dependencies

9 10 11 12 13