🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers784

AllBeginnerIntermediateAdvanced
All SourcesarXiv

Can LLMs Clean Up Your Mess? A Survey of Application-Ready Data Preparation with LLMs

Intermediate
Wei Zhou, Jun Zhou et al.Jan 22arXiv

This survey explains how large language models (LLMs) can clean, connect, and enrich messy data so it’s ready for real apps like dashboards, fraud detection, and training AI.

#Data Preparation#Data Cleaning#Data Integration

EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience

Intermediate
Taofeng Xue, Chong Peng et al.Jan 22arXiv

Before this work, computer-using AIs mostly copied old examples and struggled with long step-by-step tasks on real computers.

#computer use agent#verifiable synthesis#validator

CGPT: Cluster-Guided Partial Tables with LLM-Generated Supervision for Table Retrieval

Intermediate
Tsung-Hsiang Chou, Chen-Jui Yu et al.Jan 22arXiv

This paper introduces CGPT, a way to help computers find the right tables by building smarter mini-versions of tables and training with tough practice questions.

#table retrieval#synthetic query generation#cluster-guided partial tables

Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification

Intermediate
Yuxuan Wan, Tianqing Fang et al.Jan 22arXiv

DeepVerifier is a plug-in checker that helps Deep Research Agents catch and fix their own mistakes while they are working, without retraining.

#Deep Research Agents#verification asymmetry#rubrics-based feedback

Towards Automated Kernel Generation in the Era of LLMs

Intermediate
Yang Yu, Peiyu Zang et al.Jan 22arXiv

AI programs called LLMs can now help write the tiny, super-fast pieces of code (kernels) that make GPUs run AI models efficiently.

#LLM-driven kernel generation#GPU kernels#CUDA

Agentic Uncertainty Quantification

Intermediate
Jiaxin Zhang, Prafulla Kumar Choubey et al.Jan 22arXiv

Long AI tasks can go wrong early and keep getting worse, like a snowball of mistakes called the Spiral of Hallucination.

#Agentic Uncertainty Quantification#Spiral of Hallucination#Dual-Process Architecture

Qwen3-TTS Technical Report

Intermediate
Hangrui Hu, Xinfa Zhu et al.Jan 22arXiv

Qwen3-TTS is a family of text-to-speech models that can talk in 10+ languages, clone a new voice from just 3 seconds, and follow detailed style instructions in real time.

#Qwen3-TTS#text-to-speech#voice cloning

OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation

Intermediate
Letian Zhang, Sucheng Ren et al.Jan 21arXiv

OpenVision 3 is a single vision encoder that learns one set of image tokens that work well for both understanding images (like answering questions) and generating images (like making new pictures).

#Unified Visual Encoder#VAE#Vision Transformer

PROGRESSLM: Towards Progress Reasoning in Vision-Language Models

Intermediate
Jianshu Zhang, Chengxuan Qian et al.Jan 21arXiv

This paper asks a new question for vision-language models: not just 'What do you see?' but 'How far along is the task right now?'

#progress reasoning#vision-language models#episodic retrieval

Privacy Collapse: Benign Fine-Tuning Can Break Contextual Privacy in Language Models

Intermediate
Anmol Goel, Cornelius Emde et al.Jan 21arXiv

Benign fine-tuning meant to make language models more helpful can accidentally make them overshare private information.

#contextual privacy#privacy collapse#fine-tuning

LangForce: Bayesian Decomposition of Vision Language Action Models via Latent Action Queries

Intermediate
Shijie Lian, Bin Yu et al.Jan 21arXiv

Robots often learn a bad habit called the vision shortcut: they guess the task just by looking, and ignore the words you tell them.

#Vision-Language-Action#Bayesian decomposition#Latent Action Queries

Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning

Intermediate
Yifan Wang, Shiyu Li et al.Jan 21arXiv

Render-of-Thought (RoT) turns the model’s step-by-step thinking from long text into slim images so the model can think faster with fewer tokens.

#Render-of-Thought#Chain-of-Thought#Latent Reasoning
2021222324