This paper introduces the Confucius Code Agent (CCA), a coding helper built to handle huge real-world codebases with long tasks and many tools.
This paper creates MotionEdit, a high-quality dataset that teaches AI to change how people and objects move in a picture without breaking their looks or the scene.
This paper shows how to make home-helper robots better at long, multi-step chores by smart training on diverse tasks and by polishing the model after training using its own best attempts.
Robots often act like goldfish with short memories; HiF-VLA fixes this by letting them use motion to remember the past and predict the future.
UniUGP is a single system that learns to understand road scenes, explain its thinking, plan safe paths, and even imagine future video frames.
This paper introduces BiCo, a one-shot way to mix ideas from images and videos by tightly tying each visual idea to the exact words in a prompt.
Role-playing agents need to juggle several goals at once, like staying in character, following instructions, and using the right tone.
MentraSuite is a complete toolkit that teaches large language models (LLMs) to reason about mental health step by step, not just sound caring.
The paper shows that video AIs do not need long, human-like chains of thought to reason well.
Before this work, most big language models talked one word at a time (autoregressive), which made them slow and hard to parallelize.
VABench is a new, all-in-one test that checks how well AI makes videos with matching sound and pictures.
OmniPSD is a new AI that can both make layered Photoshop (PSD) files from words and take apart a flat image into clean, editable layers.