DeepPlanning is a new benchmark that tests whether AI can make long, realistic plans that fit time and money limits.
This paper builds a new test, LongShOTBench, to check if AI can truly understand long videos by using sight, speech, and sounds together.
INTELLECT-3 is a 106B-parameter Mixture-of-Experts model (about 12B active per token) trained with large-scale reinforcement learning and it beats many bigger models on math, coding, science, and reasoning tests.