TOPReward: Token Probabilities as Hidden Zero-Shot Rewards for Robotics
IntermediateShirui Chen, Cole Harrison et al.Feb 22arXiv
Robots learn better when they get small hints at every step instead of only a final thumbs-up or thumbs-down.
#TOPReward#token probabilities#logits