DeepPlanning is a new benchmark that tests whether AI can make long, realistic plans that fit time and money limits.
PACEvolve is a new recipe that helps AI agents improve their ideas step by step over long periods without getting stuck.