DeepPlanning is a new benchmark that tests whether AI can make long, realistic plans that fit time and money limits.
PACEvolve is a new recipe that helps AI agents improve their ideas step by step over long periods without getting stuck.
This paper shows a new way (called RISE) to find and control how AI models think without needing any human-made labels.