OpenAutoNLU is a simple, open-source tool that automatically builds text understanding models for you.
This paper finds a precise way to describe and fix the Modality Gap, which is when image and text features that mean the same thing still sit in different places in the AI’s memory space.
OpenVision 3 is a single vision encoder that learns one set of image tokens that work well for both understanding images (like answering questions) and generating images (like making new pictures).
Humans keep a big-picture memory (a “mindscape”) when reading long things; this paper teaches AI to do the same.