🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers1252

AllBeginnerIntermediateAdvanced
All SourcesarXiv

ArtHOI: Articulated Human-Object Interaction Synthesis by 4D Reconstruction from Video Priors

Beginner
Zihao Huang, Tianqi Liu et al.Mar 4arXiv

ArtHOI is a new zero-shot method that makes people and everyday articulated objects (like doors, drawers, and fridges) move together realistically using only a single generated video as guidance.

#articulated human-object interaction#4D reconstruction#optical flow segmentation

$V_1$: Unifying Generation and Self-Verification for Parallel Reasoners

Intermediate
Harman Singh, Xiuyu Li et al.Mar 4arXiv

The paper shows that when a model compares two of its own answers head-to-head, it picks the right one more often than when it judges each answer alone.

#pairwise self-verification#test-time scaling#parallel reasoning

CubeComposer: Spatio-Temporal Autoregressive 4K 360° Video Generation from Perspective Video

Intermediate
Lingen Li, Guangzhi Wang et al.Mar 4arXiv

CubeComposer is a new AI method that turns a normal forward-facing video into a full 360° VR video at true 4K quality without using super-resolution upscaling.

#360° video generation#cubemap#spatio-temporal autoregression

Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory

Beginner
Zhenting Wang, Huancheng Chen et al.Mar 4arXiv

This paper teaches long-horizon AI agents to remember everything exactly without stuffing their whole memory at once.

#indexed memory#LLM agents#long-horizon tasks

RIVER: A Real-Time Interaction Benchmark for Video LLMs

Intermediate
Yansong Shi, Qingsong Zhao et al.Mar 4arXiv

RIVER Bench is a new test that checks how well AI can watch a video stream and talk with you in real time.

#RIVER Bench#online video understanding#multimodal large language models

Phi-4-reasoning-vision-15B Technical Report

Intermediate
Jyoti Aneja, Michael Harrison et al.Mar 4arXiv

Phi-4-reasoning-vision-15B is a small, open-weight AI that understands pictures and text together and is especially good at math, science, and using computer screens.

#multimodal reasoning#vision-language model#mid-fusion

SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration

Intermediate
Jialong Chen, Xander Xu et al.Mar 4arXiv

SWE-CI is a new benchmark that tests how well AI coding agents can keep a codebase healthy over many changes, not just fix one bug.

#SWE-CI#continuous integration#code maintainability

T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning

Intermediate
Qinsi Wang, Hancheng Ye et al.Mar 4arXiv

This paper shows that teaching AI to first draw a simple map of a text (nodes and links) before answering questions makes it smarter and more reliable.

#Structure of Thought#Text-to-Structure#Intermediate Representation

MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Barrier

Intermediate
Zonglin Yang, Lidong BingMar 4arXiv

Scientists want AI to propose brand‑new hypotheses directly from a research background, but training a model to do this end‑to‑end is mathematically intractable because the search space explodes combinatorially.

#scientific discovery#hypothesis generation#P(h|b)

InfinityStory: Unlimited Video Generation with World Consistency and Character-Aware Shot Transitions

Intermediate
Mohamed Elmoghany, Liangbing Zhao et al.Mar 4arXiv

InfinityStory is a new system that can make very long videos (even hours) where the world stays the same and characters transition smoothly between shots.

#long-form video generation#background consistency#multi-agent planning

Proact-VL: A Proactive VideoLLM for Real-Time AI Companions

Beginner
Weicai Yan, Yuhong Dai et al.Mar 3arXiv

Proact-VL is a video-talking AI that knows not only what to say but also when to say it, like a great sports commentator.

#Proactive VideoLLM#real-time commentary#streaming video understanding

Utonia: Toward One Encoder for All Point Clouds

Intermediate
Yujia Zhang, Xiaoyang Wu et al.Mar 3arXiv

Utonia is a single brain (encoder) that learns from many kinds of 3D point clouds, like indoor rooms, outdoor streets, tiny toys, and even city maps.

#Utonia#point cloud#self-supervised learning
12345