This survey explains how large language models (LLMs) can clean, connect, and enrich messy data so it’s ready for real apps like dashboards, fraud detection, and training AI.
Before this work, computer-using AIs mostly copied old examples and struggled with long step-by-step tasks on real computers.
This paper introduces CGPT, a way to help computers find the right tables by building smarter mini-versions of tables and training with tough practice questions.
DeepVerifier is a plug-in checker that helps Deep Research Agents catch and fix their own mistakes while they are working, without retraining.
AI programs called LLMs can now help write the tiny, super-fast pieces of code (kernels) that make GPUs run AI models efficiently.
Long AI tasks can go wrong early and keep getting worse, like a snowball of mistakes called the Spiral of Hallucination.
Qwen3-TTS is a family of text-to-speech models that can talk in 10+ languages, clone a new voice from just 3 seconds, and follow detailed style instructions in real time.
OpenVision 3 is a single vision encoder that learns one set of image tokens that work well for both understanding images (like answering questions) and generating images (like making new pictures).
This paper asks a new question for vision-language models: not just 'What do you see?' but 'How far along is the task right now?'
Benign fine-tuning meant to make language models more helpful can accidentally make them overshare private information.
Robots often learn a bad habit called the vision shortcut: they guess the task just by looking, and ignore the words you tell them.
Render-of-Thought (RoT) turns the model’s step-by-step thinking from long text into slim images so the model can think faster with fewer tokens.