RelayLLM: Efficient Reasoning via Collaborative Decoding
IntermediateChengsong Huang, Tong Zheng et al.Jan 8arXiv
RelayLLM lets a small model do the talking and only asks a big model for help on a few, truly hard tokens.
#token-level collaboration#<call>n</call> command#collaborative decoding