Papers2

#Self-Reflection

The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning

Qiguang Chen, Yantao Du et al.Jan 9arXiv

This paper says long chain-of-thought (Long CoT) works best when it follows a 'molecular' pattern with three kinds of thinking bonds: Deep-Reasoning, Self-Reflection, and Self-Exploration.

#Long Chain-of-Thought#reasoning bonds#Deep Reasoning

Not triaged yet

Meta-RL Induces Exploration in Language Agents

Intermediate

Yulun Jiang, Liangze Jiang et al.Dec 18arXiv

This paper introduces LAMER, a Meta-RL training framework that teaches language agents to explore first and then use what they learned to solve tasks faster.

#Meta-Reinforcement Learning#Language Agents#Exploration vs Exploitation

Not triaged yet