DREAM is one model that both understands images (like CLIP) and makes images from text (like top text-to-image models).
This paper builds a medical image segmentation system that uses both pictures (like X-rays) and words (short clinical text) at the same time.
Kimi K2.5 is a new open-source AI that can read both text and visuals (images and videos) and act like a team of helpers to finish big tasks faster.
Robots often learn a bad habit called the vision shortcut: they guess the task just by looking, and ignore the words you tell them.
CoLog is a new AI system that reads computer logs like a story and spots both single strange events (point anomalies) and strange patterns over time (collective anomalies).