🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers807

AllBeginnerIntermediateAdvanced
All SourcesarXiv

Yume-1.5: A Text-Controlled Interactive World Generation Model

Intermediate
Xiaofeng Mao, Zhen Li et al.Dec 26arXiv

Yume1.5 is a model that turns text or a single image into a living, explorable video world you can move through with keyboard keys.

#interactive world generation#video diffusion#temporal-spatial-channel modeling

SciEvalKit: An Open-source Evaluation Toolkit for Scientific General Intelligence

Intermediate
Yiheng Wang, Yixin Chen et al.Dec 26arXiv

SciEvalKit is a new open-source toolkit that tests AI on real scientific skills, not just trivia or simple Q&A.

#scientific intelligence evaluation#multimodal scientific reasoning#symbolic reasoning in AI

SpotEdit: Selective Region Editing in Diffusion Transformers

Intermediate
Zhibin Qin, Zhenxiong Tan et al.Dec 26arXiv

SpotEdit is a training‑free way to edit only the parts of an image that actually change, instead of re-generating the whole picture.

#Diffusion Transformer#Selective image editing#Region-aware editing

MAI-UI Technical Report: Real-World Centric Foundation GUI Agents

Intermediate
Hanzhang Zhou, Xu Zhang et al.Dec 26arXiv

MAI-UI is a family of AI agents that can see, understand, and control phone and computer screens using plain language.

#GUI agent#GUI grounding#mobile navigation

SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents

Intermediate
Shaofei Cai, Yulei Qin et al.Dec 26arXiv

SmartSnap teaches an agent not only to finish a phone task but also to prove it with a few perfect snapshots it picks itself.

#Self-verifying agents#Evidence curation#3C principles

SWE-RM: Execution-free Feedback For Software Engineering Agents

Intermediate
KaShun Shum, Binyuan Hui et al.Dec 26arXiv

Coding agents used to fix software rely on feedback; unit tests give only pass/fail signals that are often noisy or missing.

#execution-free feedback#reward model#software engineering agents

TimeBill: Time-Budgeted Inference for Large Language Models

Intermediate
Qi Fan, An Zou et al.Dec 26arXiv

TimeBill is a way to help big AI models finish their answers on time without ruining answer quality.

#time-budgeted inference#response length prediction#execution time estimation

Few Tokens Matter: Entropy Guided Attacks on Vision-Language Models

Intermediate
Mengqi He, Xinyu Tian et al.Dec 26arXiv

The paper shows that when vision-language models write captions, only a small set of uncertain words (about 20%) act like forks that steer the whole sentence.

#vision-language models#autoregressive generation#entropy

Knot Forcing: Taming Autoregressive Video Diffusion Models for Real-time Infinite Interactive Portrait Animation

Intermediate
Steven Xiao, Xindi Zhang et al.Dec 25arXiv

This paper introduces Knot Forcing, a way to make talking-head videos that look great while being generated live, frame by frame.

#Knot Forcing#autoregressive video diffusion#temporal knot

An Information Theoretic Perspective on Agentic System Design

Intermediate
Shizhe He, Avanika Narayan et al.Dec 25arXiv

The paper shows that many AI systems work best when a small 'compressor' model first shrinks long text into a short, info-packed summary and a bigger 'predictor' model then reasons over that summary.

#agentic systems#compressor-predictor#mutual information

UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture

Intermediate
Shuo Cao, Jiayang Li et al.Dec 25arXiv

This paper teaches AI to notice not just what is in a picture, but how the picture looks and feels to people.

#perceptual image understanding#image aesthetics assessment (IAA)#image quality assessment (IQA)

HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming

Intermediate
Haonan Qiu, Shikun Liu et al.Dec 24arXiv

HiStream makes 1080p video generation much faster by removing repeated work across space, time, and steps.

#high-resolution video generation#diffusion transformer (DiT)#dual-resolution caching
4546474849