🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers1262

AllBeginnerIntermediateAdvanced
All SourcesarXiv

VG-Refiner: Towards Tool-Refined Referring Grounded Reasoning via Agentic Reinforcement Learning

Intermediate
Yuji Wang, Wenlong Liu et al.Dec 6arXiv

VG-Refiner is a new way for AI to find the right object in a picture when given a description, even if helper tools make mistakes.

#visual grounding#referring expression comprehension#tool-integrated visual reasoning

Not triaged yet

EditThinker: Unlocking Iterative Reasoning for Any Image Editor

Intermediate
Hongyu Li, Manyuan Zhang et al.Dec 5arXiv

EditThinker is a helper brain for any image editor that thinks, checks, and rewrites the instruction in multiple rounds until the picture looks right.

#instruction-based image editing#iterative reasoning#multimodal large language model

Not triaged yet

World Models That Know When They Don't Know: Controllable Video Generation with Calibrated Uncertainty

Intermediate
Zhiting Mei, Tenny Yin et al.Dec 5arXiv

This paper teaches video-making AI models to say how sure they are about each tiny part of every frame they create.

#controllable video generation#uncertainty quantification#calibration

Not triaged yet

SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations

Intermediate
Wenhao Yan, Sheng Ye et al.Dec 5arXiv

SCAIL is a new AI system that turns a single character image into a studio-quality animation by following the moves in a driving video.

#character animation#3D pose representation#occlusion-aware pose

Not triaged yet

Active Video Perception: Iterative Evidence Seeking for Agentic Long Video Understanding

Beginner
Ziyang Wang, Honglu Zhou et al.Dec 5arXiv

Long Video Understanding (LVU) is hard because the important clues are tiny, far apart in time, and buried in hours of mostly unimportant footage.

#Active Video Perception#Long Video Understanding#Plan-Observe-Reflect

Not triaged yet

Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning

Intermediate
Zhenpeng Su, Leiyu Pan et al.Dec 5arXiv

Reinforcement learning (RL) can make big language models smarter, but off-policy training often pushes updates too far from the “safe zone,” causing unstable learning.

#reinforcement learning#PPO-clip#KL penalty

Not triaged yet

ProPhy: Progressive Physical Alignment for Dynamic World Simulation

Intermediate
Zijun Wang, Panwen Hu et al.Dec 5arXiv

ProPhy is a new two-step method that helps video AIs follow real-world physics, not just make pretty pictures.

#physics-aware video generation#mixture-of-experts#token-level routing

Not triaged yet

BEAVER: An Efficient Deterministic LLM Verifier

Intermediate
Tarun Suresh, Nalin Wadhwa et al.Dec 5arXiv

BEAVER is a new way to check, with guaranteed certainty, how likely a language model is to give answers that obey important rules.

#BEAVER#deterministic verification#large language models

Not triaged yet

AI & Human Co-Improvement for Safer Co-Superintelligence

Beginner
Jason Weston, Jakob FoersterDec 5arXiv

This paper argues that the fastest and safest path to super-smart AI is for humans and AIs to improve together, not for AI to improve alone.

#Co-improvement#Human-AI collaboration#Co-superintelligence

Not triaged yet

SpaceControl: Introducing Test-Time Spatial Control to 3D Generative Modeling

Intermediate
Elisabetta Fedele, Francis Engelmann et al.Dec 5arXiv

SpaceControl lets you steer a powerful 3D generator with simple shapes you draw, without retraining the model.

#3D generative modeling#test-time guidance#latent space intervention

Not triaged yet

From Segments to Scenes: Temporal Understanding in Autonomous Driving via Vision-Language Model

Intermediate
Kevin Cannons, Saeed Ranjbar Alvar et al.Dec 4arXiv

This paper builds TAD, a brand-new test that checks if AI can understand what happens over time in real driving videos.

#Temporal understanding#Autonomous driving#Vision-language models

Not triaged yet

Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image

Intermediate
Yanran Zhang, Ziyi Wang et al.Dec 4arXiv

This paper teaches a computer to turn one single picture into a moving 3D scene that stays consistent from every camera angle.

#4D scene generation#single-image to 4D#joint geometry and motion

Not triaged yet

101102103104105