Molmo2 is a family of vision-language models that can watch videos, understand them, and point to or track things over time using fully open weights, data, and code.
The paper introduces DASD-4B-Thinking, a small (4B) open-source reasoning model that scores like much larger models on hard math, science, and coding tests.