RelayLLM lets a small model do the talking and only asks a big model for help on a few, truly hard tokens.
Big all-in-one language models are powerful but too expensive to run everywhere, while small specialists are cheaper but narrow.