Decoding as Optimisation on the Probability Simplex: From Top-K to Top-P (Nucleus) to Best-of-K Samplers
IntermediateXiaotong Ji, Rasul Tutunov et al.Feb 20arXiv
Decoding (how a language model picks the next word) isnβt a bag of tricks; itβs a clean optimisation problem over probabilities.
#decoding as optimisation#probability simplex#softmax sampling