TimeBill: Time-Budgeted Inference for Large Language Models
IntermediateQi Fan, An Zou et al.Dec 26arXiv
TimeBill is a way to help big AI models finish their answers on time without ruining answer quality.
#time-budgeted inference#response length prediction#execution time estimation