This paper builds a real-time talking-listening head avatar that reacts naturally to your words, tone, nods, and smiles in about half a second.
CPPO is a new way to fine‑tune vision‑language models so they see pictures more accurately before they start to reason.
This paper shows that when teaching image generators with reinforcement learning, only a few early, very noisy steps actually help the model learn what people like.
Deep Delta Learning (DDL) replaces the usual “add the shortcut” rule in deep networks with a smarter, learnable move that can gently erase old info and write new info along a chosen direction.
MorphAny3D is a training-free way to smoothly change one 3D object into another, even if they are totally different (like a bee into a biplane).
GaMO is a new way to rebuild 3D scenes from just a few photos by expanding each photo’s edges (outpainting) instead of inventing whole new camera views.
The paper teaches small language models to predict open-ended future events by turning daily news into thousands of safe, graded practice questions.
Computers usually click like a woodpecker, but they struggle to drag smoothly like a human hand; this paper fixes that.
This paper presents BEDA, a simple way to make chatty AI act strategically by turning what it believes into gentle rules (probabilistic constraints) that guide what it can say.
The paper fixes a stability problem in Hyper-Connections (HC) by gently steering the network’s mixing matrix onto a safe shape (a manifold) where signals don’t blow up or vanish.
This paper builds an open, end-to-end ecosystem (ALE) that lets AI agents plan, act, and fix their own mistakes across many steps in real computer environments.
Dream2Flow lets a robot watch a short, AI-generated video of a task and then do that task in real life by following object motion in 3D.