🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers16

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#Diffusion Transformer

Semantic Routing: Exploring Multi-Layer LLM Feature Weighting for Diffusion Transformers

Intermediate
Bozhou Li, Yushuo Guan et al.Feb 3arXiv

The paper shows that using information from many layers of a language model (not just one) helps text-to-image diffusion transformers follow prompts much better.

#Diffusion Transformer#Text Conditioning#Multi-layer LLM Features

Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders

Intermediate
Shengbang Tong, Boyang Zheng et al.Jan 22arXiv

Before this work, most text-to-image models used VAEs (small, squished image codes) and struggled with slow training and overfitting on high-quality fine-tuning sets.

#Representation Autoencoder#RAE#Variational Autoencoder

SnapGen++: Unleashing Diffusion Transformers for Efficient High-Fidelity Image Generation on Edge Devices

Intermediate
Dongting Hu, Aarush Gupta et al.Jan 13arXiv

This paper shows how to make powerful image‑generating Transformers run fast on phones without needing the cloud.

#Diffusion Transformer#Sparse Attention#Adaptive Sparse Self-Attention

MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head

Intermediate
Kewei Zhang, Ye Huang et al.Jan 12arXiv

Transformers are powerful but slow because regular self-attention compares every token with every other token, which grows too fast for long sequences.

#Multi-Head Linear Attention#Linear Attention#Self-Attention

TAG-MoE: Task-Aware Gating for Unified Generative Mixture-of-Experts

Intermediate
Yu Xu, Hongbin Yan et al.Jan 12arXiv

TAG-MoE is a new way to steer Mixture-of-Experts (MoE) models using clear task hints, so the right “mini-experts” handle the right parts of an image job.

#Task-Aware Gating#Mixture-of-Experts#Unified Image Generation

Focal Guidance: Unlocking Controllability from Semantic-Weak Layers in Video Diffusion Models

Intermediate
Yuanyang Yin, Yufan Deng et al.Jan 12arXiv

Image-to-Video models often keep the picture looking right but ignore parts of the text instructions.

#Image-to-Video generation#Diffusion Transformer#Controllability

FFP-300K: Scaling First-Frame Propagation for Generalizable Video Editing

Intermediate
Xijie Huang, Chengming Xu et al.Jan 5arXiv

This paper makes video editing easier by teaching an AI to spread changes from the first frame across the whole video smoothly and accurately.

#First-Frame Propagation#Video Editing#FFP-300K

Guiding a Diffusion Transformer with the Internal Dynamics of Itself

Beginner
Xingyu Zhou, Qifan Li et al.Dec 30arXiv

This paper shows a simple way to make image-generating AIs (diffusion Transformers) produce clearer, more accurate pictures by letting the model guide itself from the inside.

#Internal Guidance#Diffusion Transformer#Intermediate Supervision

Act2Goal: From World Model To General Goal-conditioned Policy

Intermediate
Pengfei Zhou, Liliang Chen et al.Dec 29arXiv

Robots often get confused on long, multi-step tasks when they only see the final goal image and try to guess the next move directly.

#goal-conditioned policy#visual world model#multi-scale temporal hashing

SpotEdit: Selective Region Editing in Diffusion Transformers

Intermediate
Zhibin Qin, Zhenxiong Tan et al.Dec 26arXiv

SpotEdit is a training‑free way to edit only the parts of an image that actually change, instead of re-generating the whole picture.

#Diffusion Transformer#Selective image editing#Region-aware editing

DreaMontage: Arbitrary Frame-Guided One-Shot Video Generation

Intermediate
Jiawei Liu, Junqiao Li et al.Dec 24arXiv

DreaMontage is a new AI method that makes long, single-shot videos that feel smooth and connected, even when you give it scattered images or short clips in the middle.

#arbitrary frame conditioning#one-shot video generation#Diffusion Transformer

DeContext as Defense: Safe Image Editing in Diffusion Transformers

Intermediate
Linghui Shen, Mingyue Cui et al.Dec 18arXiv

This paper protects your photos from being misused by new AI image editors that can copy your face or style from just one picture.

#Diffusion Transformer#cross-attention#in-context image editing
12