Papers906

Quartet II: Accurate LLM Pre-Training in NVFP4 by Improved Unbiased Gradient Estimation

Andrei Panferov, Erik Schultheis et al.Jan 30arXiv

This paper shows how to train big language models faster and cheaper by using 4-bit numbers (NVFP4) without losing much accuracy.

#NVFP4#FP4 training#quantization-aware training

VisionTrim: Unified Vision Token Compression for Training-Free MLLM Acceleration

Intermediate

Hanxun Yu, Wentong Li et al.Jan 30arXiv

VisionTrim makes picture-and-text AI models run much faster by keeping only the most useful visual pieces (tokens) and smartly merging the rest.

#vision token compression#training-free acceleration#multimodal large language model

Pushing the Boundaries of Natural Reasoning: Interleaved Bonus from Formal-Logic Verification

Intermediate

Chuxue Cao, Jinluan Yang et al.Jan 30arXiv

Large language models sometimes reach the right answer for the wrong reasons, which is risky and confusing.

#formal logic verification#interleaved verification#neuro-symbolic reasoning

Statistical Estimation of Adversarial Risk in Large Language Models under Best-of-N Sampling

Intermediate

Mingqian Feng, Xiaodong Liu et al.Jan 30arXiv

Real attackers can try many prompts in parallel until a model slips, so testing safety with only one try badly underestimates risk.

#Best-of-N sampling#Adversarial risk#Attack Success Rate (ASR)

TTCS: Test-Time Curriculum Synthesis for Self-Evolving

Intermediate

Chengyi Yang, Zhishang Xiang et al.Jan 30arXiv

TTCS is a way for a model to teach itself during the test by first making easier practice questions that are similar to the real hard question and then learning from them.

#test-time training#test-time reinforcement learning#curriculum learning

Rethinking LLM-as-a-Judge: Representation-as-a-Judge with Small Language Models via Semantic Capacity Asymmetry

Intermediate

Zhuochun Li, Yong Zhang et al.Jan 30arXiv

Big models are often used to grade AI answers, but they are expensive, slow, and depend too much on tricky prompts.

#Representation-as-a-Judge#Semantic Capacity Asymmetry#LLM-as-a-Judge

SSL: Sweet Spot Learning for Differentiated Guidance in Agentic Optimization

Beginner

Jinyang Wu, Changpeng Yang et al.Jan 30arXiv

Most reinforcement learning agents only get a simple pass/fail reward, which hides how good or bad their attempts really were.

#Sweet Spot Learning#tiered rewards#reinforcement learning with verifiable rewards

RAPTOR: Ridge-Adaptive Logistic Probes

Intermediate

Ziqi Gao, Yaotian Zhu et al.Jan 29arXiv

RAPTOR is a simple, fast way to find a direction (a concept vector) inside a frozen language model that points toward a concept like 'sarcasm' or 'positivity.'

#probing#concept vectors#activation steering

One-step Latent-free Image Generation with Pixel Mean Flows

Beginner

Yiyang Lu, Susie Lu et al.Jan 29arXiv

This paper shows how to make a whole picture in one go, directly in pixels, without using a hidden “latent” space or many tiny steps.

#pixel MeanFlow#one-step generation#x-prediction

Discovering Hidden Gems in Model Repositories

Intermediate

Jonathan Kahana, Eliahu Horwitz et al.Jan 29arXiv

Millions of public AI models exist, but downloads are concentrated on a tiny set of “official” checkpoints, which are not always the best performers.

#hidden gems#model repositories#model trees

Hybrid Linear Attention Done Right: Efficient Distillation and Effective Architectures for Extremely Long Contexts

Intermediate

Yingfa Chen, Zhen Leng Thai et al.Jan 29arXiv

This paper shows how to turn a big Transformer model into a faster hybrid model that mixes attention and RNN layers using far less training data (about 2.3B tokens).

#hybrid attention#RNN attention hybrid#linear attention

Exploring Reasoning Reward Model for Agents

Intermediate

Kaixuan Fan, Kaituo Feng et al.Jan 29arXiv

The paper teaches AI agents better by grading not just their final answers, but also how they think and use tools along the way.

#Agentic Reinforcement Learning#Reasoning Reward Model#Process Supervision

11 12 13 14 15