🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers943

AllBeginnerIntermediateAdvanced
All SourcesarXiv

Active Video Perception: Iterative Evidence Seeking for Agentic Long Video Understanding

Beginner
Ziyang Wang, Honglu Zhou et al.Dec 5arXiv

Long Video Understanding (LVU) is hard because the important clues are tiny, far apart in time, and buried in hours of mostly unimportant footage.

#Active Video Perception#Long Video Understanding#Plan-Observe-Reflect

Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning

Intermediate
Zhenpeng Su, Leiyu Pan et al.Dec 5arXiv

Reinforcement learning (RL) can make big language models smarter, but off-policy training often pushes updates too far from the “safe zone,” causing unstable learning.

#reinforcement learning#PPO-clip#KL penalty

ProPhy: Progressive Physical Alignment for Dynamic World Simulation

Intermediate
Zijun Wang, Panwen Hu et al.Dec 5arXiv

ProPhy is a new two-step method that helps video AIs follow real-world physics, not just make pretty pictures.

#physics-aware video generation#mixture-of-experts#token-level routing

BEAVER: An Efficient Deterministic LLM Verifier

Intermediate
Tarun Suresh, Nalin Wadhwa et al.Dec 5arXiv

BEAVER is a new way to check, with guaranteed certainty, how likely a language model is to give answers that obey important rules.

#BEAVER#deterministic verification#large language models

AI & Human Co-Improvement for Safer Co-Superintelligence

Beginner
Jason Weston, Jakob FoersterDec 5arXiv

This paper argues that the fastest and safest path to super-smart AI is for humans and AIs to improve together, not for AI to improve alone.

#Co-improvement#Human-AI collaboration#Co-superintelligence

SpaceControl: Introducing Test-Time Spatial Control to 3D Generative Modeling

Intermediate
Elisabetta Fedele, Francis Engelmann et al.Dec 5arXiv

SpaceControl lets you steer a powerful 3D generator with simple shapes you draw, without retraining the model.

#3D generative modeling#test-time guidance#latent space intervention

From Segments to Scenes: Temporal Understanding in Autonomous Driving via Vision-Language Model

Intermediate
Kevin Cannons, Saeed Ranjbar Alvar et al.Dec 4arXiv

This paper builds TAD, a brand-new test that checks if AI can understand what happens over time in real driving videos.

#Temporal understanding#Autonomous driving#Vision-language models

Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image

Intermediate
Yanran Zhang, Ziyi Wang et al.Dec 4arXiv

This paper teaches a computer to turn one single picture into a moving 3D scene that stays consistent from every camera angle.

#4D scene generation#single-image to 4D#joint geometry and motion

Arbitrage: Efficient Reasoning via Advantage-Aware Speculation

Intermediate
Monishwaran Maheswaran, Rishabh Tiwari et al.Dec 4arXiv

ARBITRAGE makes AI solve step-by-step problems faster by only using the big, slow model when it is predicted to truly help.

#speculative decoding#step-level speculative decoding#advantage-aware routing

EMMA: Efficient Multimodal Understanding, Generation, and Editing with a Unified Architecture

Intermediate
Xin He, Longhui Wei et al.Dec 4arXiv

EMMA is a single AI model that can understand images, write about them, create new images from text, and edit images—all in one unified system.

#EMMA#unified multimodal architecture#32x autoencoder

EtCon: Edit-then-Consolidate for Reliable Knowledge Editing

Intermediate
Ruilin Li, Yibin Wang et al.Dec 4arXiv

Large language models forget or misuse new facts if you only poke their weights once; EtCon fixes this with a two-step plan.

#knowledge editing#EtCon#TPSFT

COOPER: A Unified Model for Cooperative Perception and Reasoning in Spatial Intelligence

Beginner
Zefeng Zhang, Xiangzhao Hao et al.Dec 4arXiv

COOPER is a single AI model that both “looks better” (perceives depth and object boundaries) and “thinks smarter” (reasons step by step) to answer spatial questions about images.

#COOPER#multimodal large language model#unified model
7576777879