ActionMesh is a fast, feed-forward AI that turns videos, images + text, text alone, or a given 3D model into an animated 3D mesh.
This paper introduces EDIR, a new and much more detailed test for Composed Image Retrieval (CIR), where you search for a target image using a starting image plus a short text change.
SAMTok turns any object’s mask in an image into just two special “words” so language models can handle pixels like they handle text.
The paper builds special Turkish legal AI models called Mecellem by teaching them from the ground up and then giving them more law-focused lessons.
Diffusion models make pictures from noise but often miss what people actually want in the prompt or what looks good to humans.
Stable-DiffCoder is a code-focused diffusion language model that learns to write and edit programs by filling in masked pieces, not just predicting the next token.
This survey explains how large language models (LLMs) can clean, connect, and enrich messy data so it’s ready for real apps like dashboards, fraud detection, and training AI.
Before this work, computer-using AIs mostly copied old examples and struggled with long step-by-step tasks on real computers.
This paper introduces CGPT, a way to help computers find the right tables by building smarter mini-versions of tables and training with tough practice questions.
DeepVerifier is a plug-in checker that helps Deep Research Agents catch and fix their own mistakes while they are working, without retraining.
AI programs called LLMs can now help write the tiny, super-fast pieces of code (kernels) that make GPUs run AI models efficiently.
Long AI tasks can go wrong early and keep getting worse, like a snowball of mistakes called the Spiral of Hallucination.