ResearchGym: Evaluating Language Model Agents on Real-World AI Research
IntermediateAniketh Garikaparthi, Manasi Patwardhan et al.Feb 16arXiv
ResearchGym is a new "gym" where AI agents are tested on real research projects end to end, not just on toy problems.
#ResearchGym#closed-loop research#objective evaluation