JavisGPT is a single AI that can both understand sounding videos (audio + video together) and also create new ones that stay in sync.
Large Multimodal Models (LMMs) are great at reading text and looking at pictures, but they usually do most of their thinking in words, which limits deep visual reasoning.
StoryMem is a new way to make minute‑long, multi‑shot videos that keep the same characters, places, and style across many clips.
ReCo is a new way to edit videos just by telling the computer what to change with words, no extra masks needed.
InsertAnywhere is a two-stage system that lets you add a new object into any video so it looks like it was always there.
AniX is a system that lets you place any character into any 3D world and control them with plain language, like “run forward” or “play a guitar.”
EgoX turns a regular third-person video into a first-person video that looks like it was filmed from the actor’s eyes.
Most image-similarity tools only notice how things look (color, shape, class) and miss deeper, human-like connections.