🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers6

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#direct preference optimization

Unified Personalized Reward Model for Vision Generation

Intermediate
Yibin Wang, Yuhang Zang et al.Feb 2arXiv

The paper introduces UnifiedReward-Flex, a reward model that judges images and videos the way a thoughtful human would—by flexibly changing what it checks based on the prompt and the visual evidence.

#personalized reward model#multimodal reward#context-adaptive reasoning

EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience

Intermediate
Taofeng Xue, Chong Peng et al.Jan 22arXiv

Before this work, computer-using AIs mostly copied old examples and struggled with long step-by-step tasks on real computers.

#computer use agent#verifiable synthesis#validator

VIBE: Visual Instruction Based Editor

Intermediate
Grigorii Alekseenko, Aleksandr Gordeev et al.Jan 5arXiv

VIBE is a tiny but mighty image editor that listens to your words and changes pictures while keeping the original photo intact unless you ask otherwise.

#instruction-based image editing#vision-language model#diffusion model

Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation

Intermediate
Taekyung Ki, Sangwon Jang et al.Jan 2arXiv

This paper builds a real-time talking-listening head avatar that reacts naturally to your words, tone, nods, and smiles in about half a second.

#interactive avatar#talking head generation#causal diffusion forcing

PhyGDPO: Physics-Aware Groupwise Direct Preference Optimization for Physically Consistent Text-to-Video Generation

Intermediate
Yuanhao Cai, Kunpeng Li et al.Dec 31arXiv

This paper teaches text-to-video models to follow real-world physics, so people, balls, water, glass, and fire act the way they should.

#text-to-video generation#physical consistency#direct preference optimization

LLaDA2.0: Scaling Up Diffusion Language Models to 100B

Intermediate
Tiwei Bie, Maosong Cao et al.Dec 10arXiv

Before this work, most big language models talked one word at a time (autoregressive), which made them slow and hard to parallelize.

#diffusion language model#masked diffusion#block diffusion