GenEnv is a training system where a student AI and a teacher simulator grow together by exchanging tasks and feedback.
Autoregressive (AR) image models make pictures by choosing tokens one-by-one, but they were judged only on picking likely tokens, not on how good the final picture looks in pixels.
Over++ is a video AI that adds realistic effects like shadows, splashes, dust, and smoke between a foreground and a background without changing the original footage.
StoryMem is a new way to make minute‑long, multi‑shot videos that keep the same characters, places, and style across many clips.
CASA is a new way to mix images and text inside a language model that keeps speed and memory low while keeping accuracy high.
QuantiPhy is a new test that checks if AI models can measure real-world physics from videos using numbers, not guesses.
QuCo-RAG is a new way to decide when an AI should look things up while it writes, using facts from its training data instead of its own shaky confidence.
DramaBench is a new test that checks how well AI continues drama scripts across six separate skills instead of one big score.
This paper asks a simple question with big impact: Can AI tell which test questions are hard for humans?
This paper asks if large language models (LLMs) can act like "world models" that predict what happens next in text-based environments, not just the next word in a sentence.
This paper builds a tough new test called O3-BENCH to check if AI can truly think with images, not just spot objects.
Capitalization tie-out checks if a company’s ownership table truly matches what its legal documents say.