🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers807

AllBeginnerIntermediateAdvanced
All SourcesarXiv

DiffusionVL: Translating Any Autoregressive Models into Diffusion Vision Language Models

Intermediate
Lunbin Zeng, Jingfeng Yao et al.Dec 17arXiv

This paper shows a simple way to turn any strong autoregressive (step-by-step) model into a diffusion vision-language model (parallel, block-by-block) without changing the architecture.

#DiffusionVL#diffusion vision-language model#block diffusion

End-to-End Training for Autoregressive Video Diffusion via Self-Resampling

Intermediate
Yuwei Guo, Ceyuan Yang et al.Dec 17arXiv

This paper fixes a common problem in video-making AIs where tiny mistakes snowball over time and ruin long videos.

#autoregressive video diffusion#exposure bias#teacher forcing

Skyra: AI-Generated Video Detection via Grounded Artifact Reasoning

Intermediate
Yifei Li, Wenzhao Zheng et al.Dec 17arXiv

Skyra is a detective-style AI that spots tiny visual mistakes (artifacts) in videos to tell if they are real or AI-generated, and it explains its decision with times and places in the video.

#AI-generated video detection#artifact reasoning#multimodal large language model

Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning

Intermediate
Zhenwen Liang, Sidi Lu et al.Dec 17arXiv

This paper teaches large language models (LLMs) to explore smarter by listening to their own gradients—the directions they would update—rather than chasing random variety.

#gradient-guided reinforcement learning#GRL#GRPO

VTCBench: Can Vision-Language Models Understand Long Context with Vision-Text Compression?

Intermediate
Hongbo Zhao, Meng Wang et al.Dec 17arXiv

Long texts are expensive for AI to read because each extra token costs a lot of compute and memory.

#vision‑text compression#VTCBench#vision‑language models

IC-Effect: Precise and Efficient Video Effects Editing via In-Context Learning

Intermediate
Yuanhang Li, Yiren Song et al.Dec 17arXiv

IC-Effect is a new way to add special effects to existing videos by following a text instruction while keeping everything else unchanged.

#video editing#visual effects#diffusion transformer

Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition

Intermediate
Shengming Yin, Zekai Zhang et al.Dec 17arXiv

The paper turns one flat picture into a neat stack of see‑through layers, so you can edit one thing without messing up the rest.

#image decomposition#RGBA layers#alpha blending

GRAN-TED: Generating Robust, Aligned, and Nuanced Text Embedding for Diffusion Models

Intermediate
Bozhou Li, Sihan Yang et al.Dec 17arXiv

This paper is about making the words you type into a generator turn into the right pictures and videos more reliably.

#diffusion models#text encoder#multimodal large language model

Nemotron-Math: Efficient Long-Context Distillation of Mathematical Reasoning from Multi-Mode Supervision

Intermediate
Wei Du, Shubham Toshniwal et al.Dec 17arXiv

Nemotron-Math is a giant math dataset with 7.5 million step-by-step solutions created in three thinking styles and with or without Python help.

#mathematical reasoning#long-context fine-tuning#multi-mode supervision

Step-GUI Technical Report

Intermediate
Haolong Yan, Jia Wang et al.Dec 17arXiv

This paper builds Step-GUI, a pair of small-but-strong GUI agent models (4B/8B) that can use phones and computers by looking at screenshots and following instructions.

#GUI automation#multimodal large language models#trajectory-level calibration

Robust and Calibrated Detection of Authentic Multimedia Content

Intermediate
Sarim Hashmi, Abdelrahman Elsayed et al.Dec 17arXiv

Deepfakes are getting so good that simple yes/no detectors are failing, especially when attackers add tiny, invisible changes.

#Authenticity Index#calibrated resynthesis#reconstruction-free inversion

DEER: Draft with Diffusion, Verify with Autoregressive Models

Intermediate
Zicong Cheng, Guo-Wei Yang et al.Dec 17arXiv

DEER is a new way to speed up big language models by letting a diffusion model draft many tokens at once and an autoregressive model double-check them.

#DEER#speculative decoding#diffusion LLM
5354555657