Papers2

#LLM post-training

Approximation of Log-Partition Function in Policy Mirror Descent Induces Implicit Regularization for LLM Post-Training

The paper studies a simple way to train giant language models with reinforcement learning by replacing a hard-to-compute term (the log-partition function) with something easy: the mean reward.

#Policy Mirror Descent#KL regularization#chi-squared regularization

MentraSuite: Post-Training Large Language Models for Mental Health Reasoning and Assessment

Intermediate

Mengxi Xiao, Kailai Yang et al.Dec 10arXiv

MentraSuite is a complete toolkit that teaches large language models (LLMs) to reason about mental health step by step, not just sound caring.

#mental health reasoning#LLM post-training#supervised fine-tuning