SPARK: Stepwise Process-Aware Rewards for Reference-Free Reinforcement Learning
IntermediateSalman Rahman, Sruthi Gorantla et al.Dec 2arXiv
SPARK teaches AI to grade its own steps without needing the right answers written down anywhere.
#SPARK#Process Reward Model#PRM-CoT