πŸŽ“How I Study AIHISA
πŸ“–Read
πŸ“„PapersπŸ“°Blogs🎬Courses
πŸ’‘Learn
πŸ›€οΈPathsπŸ“šTopicsπŸ’‘Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers2

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#group relative policy optimization (GRPO)

Latent Chain-of-Thought as Planning: Decoupling Reasoning from Verbalization

Intermediate
Jiecong Wang, Hao Peng et al.Jan 29arXiv

This paper introduces PLaT, a way for AI to think silently in a hidden space (the brain) and only speak when needed (the mouth).

#latent chain-of-thought#planning in latent space#planner-decoder architecture

BAPO: Boundary-Aware Policy Optimization for Reliable Agentic Search

Intermediate
Shiyu Liu, Yongjing Yin et al.Jan 16arXiv

RL-trained search agents often sound confident even when they don’t know, which can mislead people.

#agentic search#reinforcement learning#boundary awareness