Papers784

DIFFA-2: A Practical Diffusion Large Language Model for General Audio Understanding

Jiaming Zhou, Xuxin Cheng et al.Jan 30arXiv

DIFFA-2 is a new audio AI that listens to speech, sounds, and music and answers questions about them using a diffusion-style language model instead of the usual step-by-step (autoregressive) method.

#Diffusion language models#Audio understanding#Large audio language model

THINKSAFE: Self-Generated Safety Alignment for Reasoning Models

Intermediate

Seanie Lee, Sangwoo Park et al.Jan 30arXiv

Large reasoning models got very good at thinking step-by-step, but that sometimes made them too eager to follow harmful instructions.

#THINKSAFE#self-generated safety alignment#refusal steering

Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text

Intermediate

Ximing Lu, David Acuna et al.Jan 30arXiv

Golden Goose turns messy internet text into clean multiple-choice puzzles that computers can learn from and get automatic rewards for.

#Reinforcement Learning with Verifiable Rewards#Golden Goose#GooseReason-0.7M

Residual Context Diffusion Language Models

Intermediate

Yuezhou Hu, Harman Singh et al.Jan 30arXiv

Diffusion language models (dLLMs) generate several tokens at once but usually throw away lots of helpful clues each step—RCD keeps and reuses those clues.

#diffusion language models#residual context diffusion#soft tokens

DINO-SAE: DINO Spherical Autoencoder for High-Fidelity Image Reconstruction and Generation

Intermediate

Hun Chang, Byunghee Cha et al.Jan 30arXiv

DINO-SAE is a new autoencoder that keeps both the meaning of an image (semantics) and tiny textures (fine details) at the same time.

#DINO-SAE#spherical manifold#cosine similarity alignment

MEnvAgent: Scalable Polyglot Environment Construction for Verifiable Software Engineering

Intermediate

Chuanzhe Guo, Jingjing Wu et al.Jan 30arXiv

This paper builds a smart team of AI helpers, called MEnvAgent, that automatically sets up the right computer environments for code projects in many languages.

#environment construction#software engineering agents#Fail-to-Pass (F2P)

BatCoder: Self-Supervised Bidirectional Code-Documentation Learning via Back-Translation

Intermediate

Jingwen Xu, Yiyang Lu et al.Jan 30arXiv

BatCoder teaches a code model to write both code and its documentation by doing a round trip: from code to docs and back to code.

#back-translation#self-supervised learning#reinforcement learning for code

NativeTok: Native Visual Tokenization for Improved Image Generation

Intermediate

Bin Wu, Mengqi Huang et al.Jan 30arXiv

This paper fixes a hidden mismatch in image generation: tokenizers make tokens without order, but generators need an order to predict the next token well.

#visual tokenization#autoregressive image generation#causal dependencies

VisionTrim: Unified Vision Token Compression for Training-Free MLLM Acceleration

Intermediate

Hanxun Yu, Wentong Li et al.Jan 30arXiv

VisionTrim makes picture-and-text AI models run much faster by keeping only the most useful visual pieces (tokens) and smartly merging the rest.

#vision token compression#training-free acceleration#multimodal large language model

Pushing the Boundaries of Natural Reasoning: Interleaved Bonus from Formal-Logic Verification

Intermediate

Chuxue Cao, Jinluan Yang et al.Jan 30arXiv

Large language models sometimes reach the right answer for the wrong reasons, which is risky and confusing.

#formal logic verification#interleaved verification#neuro-symbolic reasoning

Statistical Estimation of Adversarial Risk in Large Language Models under Best-of-N Sampling

Intermediate

Mingqian Feng, Xiaodong Liu et al.Jan 30arXiv

Real attackers can try many prompts in parallel until a model slips, so testing safety with only one try badly underestimates risk.

#Best-of-N sampling#Adversarial risk#Attack Success Rate (ASR)

TTCS: Test-Time Curriculum Synthesis for Self-Evolving

Intermediate

Chengyi Yang, Zhishang Xiang et al.Jan 30arXiv

TTCS is a way for a model to teach itself during the test by first making easier practice questions that are similar to the real hard question and then learning from them.

#test-time training#test-time reinforcement learning#curriculum learning

10 11 12 13 14