MemoryRewardBench: Benchmarking Reward Models for Long-Term Memory Management in Large Language Models
BeginnerZecheng Tang, Baibei Ji et al.Jan 17arXiv
This paper builds MemoryRewardBench, a big test that checks if reward models (AI judges) can fairly grade how other AIs manage long-term memory, not just whether their final answers are right.
#reward models#long-term memory#long-context reasoning