Papers14

#Supervised Fine-Tuning

Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning

The paper shows that a model that looks great after supervised fine-tuning (SFT) can actually do worse after the same reinforcement learning (RL) than a model that looked weaker at SFT time.

#Supervised Fine-Tuning#Reinforcement Learning#Distribution Mismatch

Rethinking LLM-as-a-Judge: Representation-as-a-Judge with Small Language Models via Semantic Capacity Asymmetry

Intermediate

Zhuochun Li, Yong Zhang et al.Jan 30arXiv

Big models are often used to grade AI answers, but they are expensive, slow, and depend too much on tricky prompts.

#Representation-as-a-Judge#Semantic Capacity Asymmetry#LLM-as-a-Judge

Language-based Trial and Error Falls Behind in the Era of Experience

Intermediate

Haoyu Wang, Guozheng Ma et al.Jan 29arXiv

Big language models are great at words but waste lots of time and energy when they try random actions in non-language games like Sudoku, Sokoban, 2048, FrozenLake, and Rubik’s Cube.

#SCOUT#Reinforcement Learning#Supervised Fine-Tuning

OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models

Intermediate

Yufeng Zhong, Lei Chen et al.Jan 29arXiv

OCRVerse is a new AI model that can read both plain text in documents and the visual structures in charts, webpages, and science plots, all in one system.

#Holistic OCR#Vision-Language Model#Supervised Fine-Tuning

VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents

Intermediate

Zirui Wang, Junyi Zhang et al.Jan 23arXiv

VisGym is a playground of 17 very different visual tasks that test and train AI models that see and talk (Vision–Language Models) to act over many steps.

#VisGym#Vision–Language Models#Multimodal Agents

Agentic Reasoning for Large Language Models

Intermediate

Tianxin Wei, Ting-Wei Li et al.Jan 18arXiv

This paper explains how to turn large language models (LLMs) from quiet students that only answer questions into active agents that can plan, act, and learn over time.

#Agentic Reasoning#LLM Agents#In-Context Learning

Knowledge is Not Enough: Injecting RL Skills for Continual Adaptation

Intermediate

Pingzhi Tang, Yiding Wang et al.Jan 16arXiv

Big language models can learn new facts with simple tutoring (SFT), but that doesn’t automatically teach them how to use those facts well.

#Parametric Skill Transfer#Skill Vector#Task Arithmetic

NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems

Intermediate

Jiayu Liu, Rui Wang et al.Jan 16arXiv

The paper studies why large language models (LLMs) sound too sure of themselves when using retrieval-augmented generation (RAG) and how to fix it.

#Retrieval-Augmented Generation#Confidence Calibration#Expected Calibration Error

SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving

Intermediate

Chaofan Tao, Jierun Chen et al.Jan 4arXiv

SWE-Lego shows that a simple training method called supervised fine-tuning (SFT), when done carefully, can teach AI to fix real software bugs very well.

#SWE-Lego#Supervised Fine-Tuning#Error Masking

SpatialTree: How Spatial Abilities Branch Out in MLLMs

Intermediate

Yuxi Xiao, Longfei Li et al.Dec 23arXiv

SpatialTree is a new, four-level "ability tree" that tests how multimodal AI models (that see and read) handle space: from basic seeing to acting in the world.

#Spatial Intelligence#Multimodal Large Language Models#Hierarchical Benchmark

Step-DeepResearch Technical Report

Intermediate

Chen Hu, Haikuo Du et al.Dec 23arXiv

Search is not the same as research; real research needs planning, checking many sources, fixing mistakes, and writing a clear report.

#Deep Research#Atomic Capabilities#ReAct Agent

DiRL: An Efficient Post-Training Framework for Diffusion Language Models

Intermediate

Ying Zhu, Jiaxin Wan et al.Dec 23arXiv

This paper builds DiRL, a fast and careful way to finish training diffusion language models so they reason better.

#Diffusion Language Model#Blockwise dLLM#Post-Training

1 2