Papers34

#supervised fine-tuning

Typhoon-S: Minimal Open Post-Training for Sovereign Large Language Models

Kunat Pipatanakul, Pittawat TaveekitworachaiJan 26arXiv

Typhoon-S is a simple, open recipe that turns a basic language model into a helpful assistant and then teaches it important local skills, all on small budgets.

#Typhoon-S#on-policy distillation#full-logits distillation

Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification

Intermediate

Yuxuan Wan, Tianqing Fang et al.Jan 22arXiv

DeepVerifier is a plug-in checker that helps Deep Research Agents catch and fix their own mistakes while they are working, without retraining.

#Deep Research Agents#verification asymmetry#rubrics-based feedback

Dancing in Chains: Strategic Persuasion in Academic Rebuttal via Theory of Mind

Beginner

Zhitao He, Zongwei Lyu et al.Jan 22arXiv

Academic rebuttals are not just about being polite; they are about smart, strategic persuasion under hidden information.

#academic rebuttal#theory of mind#strategic persuasion

Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment

Intermediate

Yuming Yang, Mingyoung Lai et al.Jan 20arXiv

The paper asks a simple question: Which step-by-step explanations from a teacher model actually help a student model learn to reason better?

#Rank-Surprisal Ratio#data-student suitability#chain-of-thought distillation

Advances and Frontiers of LLM-based Issue Resolution in Software Engineering: A Comprehensive Survey

Beginner

Caihua Li, Lianghong Guo et al.Jan 15arXiv

This paper is the first big map of how AI can fix real software problems, not just write short code snippets.

#SWE-bench#issue resolution#AI coding agents

TranslateGemma Technical Report

Intermediate

Mara Finkelstein, Isaac Caswell et al.Jan 13arXiv

TranslateGemma is a family of open machine translation models fine-tuned from Gemma 3 to translate many languages more accurately.

#machine translation#TranslateGemma#Gemma 3

X-Coder: Advancing Competitive Programming with Fully Synthetic Tasks, Solutions, and Tests

Intermediate

Jie Wu, Haoling Li et al.Jan 11arXiv

X-Coder shows that models can learn expert-level competitive programming using data that is 100% synthetic—no real contest problems needed.

#competitive programming#synthetic data generation#feature-based synthesis

An Empirical Study on Preference Tuning Generalization and Diversity Under Domain Shift

Intermediate

Constantinos Karouzos, Xingwei Tan et al.Jan 9arXiv

Preference tuning teaches language models to act the way people like, but those habits can fall apart when the topic or style changes (domain shift).

#preference tuning#domain shift#supervised fine-tuning

EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis

Intermediate

Xiaoshuai Song, Haofei Chang et al.Jan 9arXiv

EnvScaler is an automatic factory that builds many safe, rule-following practice worlds where AI agents can talk to users and call tools, just like real apps.

#EnvScaler#tool-interactive environments#programmatic synthesis

Controllable Memory Usage: Balancing Anchoring and Innovation in Long-Term Human-Agent Interaction

Beginner

Muzhao Tian, Zisu Huang et al.Jan 8arXiv

Long-term AI helpers remember past chats, but using all memories can trap them in old ideas (Memory Anchoring).

#steerable memory#memory anchoring#long-term agents

TCAndon-Router: Adaptive Reasoning Router for Multi-Agent Collaboration

Intermediate

Jiuzhou Zhao, Chunrong Chen et al.Jan 8arXiv

Multi-agent systems are like teams of expert helpers; the tricky part is choosing which helpers to ask for each question.

#multi-agent systems#routing#reasoning chain

Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting

Intermediate

Muxi Diao, Lele Yang et al.Jan 5arXiv

Supervised fine-tuning (SFT) often makes a model great at a new task but worse at its old skills; this paper explains a key reason why and how to fix it.

#Entropy-Adaptive Fine-Tuning#confident conflicts#token-level entropy

1 2 3