RelayGen is a training-free way to switch between a big model and a small model while one answer is being generated.
This paper fixes a common problem in reasoning AIs called Lazy Reasoning, where the model rambles instead of making a good plan.