Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies
BeginnerYuqiao Tan, Minzheng Wang et al.Dec 22arXiv
Large language models (LLMs) donβt act as a single brain; inside, each layer and module quietly makes its own mini-decisions called internal policies.
#Bottom-up Policy Optimization#internal layer policy#internal modular policy