Language-based Trial and Error Falls Behind in the Era of Experience
IntermediateHaoyu Wang, Guozheng Ma et al.Jan 29arXiv
Big language models are great at words but waste lots of time and energy when they try random actions in non-language games like Sudoku, Sokoban, 2048, FrozenLake, and Rubikโs Cube.
#SCOUT#Reinforcement Learning#Supervised Fine-Tuning