🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers129

AllBeginnerIntermediateAdvanced
All SourcesarXiv

Reveal Hidden Pitfalls and Navigate Next Generation of Vector Similarity Search from Task-Centric Views

Beginner
Tingyang Chen, Cong Fu et al.Dec 15arXiv

The paper shows that judging vector search only by distance-based recall and speed can be very misleading for real tasks.

#vector similarity search#approximate nearest neighbor#maximum inner product search

Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling

Beginner
Yuran Wang, Bohan Zeng et al.Dec 14arXiv

Scone is a new AI method that makes images from instructions while correctly picking the right subject even when many look similar.

#subject-driven image generation#multi-subject composition#subject distinction

The FACTS Leaderboard: A Comprehensive Benchmark for Large Language Model Factuality

Beginner
Aileen Cheng, Alon Jacovi et al.Dec 11arXiv

The FACTS Leaderboard is a four-part test that checks how truthful AI models are across images, memory, web search, and document grounding.

#LLM factuality#benchmarking#multimodal evaluation

Sharp Monocular View Synthesis in Less Than a Second

Beginner
Lars Mescheder, Wei Dong et al.Dec 11arXiv

SHARP turns a single photo into a 3D scene you can look around in, and it does this in under one second on a single GPU.

#monocular view synthesis#3D Gaussians#real-time neural rendering

Confucius Code Agent: Scalable Agent Scaffolding for Real-World Codebases

Beginner
Sherman Wong, Zhenting Qi et al.Dec 11arXiv

This paper introduces the Confucius Code Agent (CCA), a coding helper built to handle huge real-world codebases with long tasks and many tools.

#coding agents#agent scaffolding#context management

MotionEdit: Benchmarking and Learning Motion-Centric Image Editing

Beginner
Yixin Wan, Lei Ke et al.Dec 11arXiv

This paper creates MotionEdit, a high-quality dataset that teaches AI to change how people and objects move in a picture without breaking their looks or the scene.

#motion-centric image editing#optical flow#MotionEdit dataset

VABench: A Comprehensive Benchmark for Audio-Video Generation

Beginner
Daili Hua, Xizhi Wang et al.Dec 10arXiv

VABench is a new, all-in-one test that checks how well AI makes videos with matching sound and pictures.

#audio-video benchmark#synchronization#lip-sync

Visionary: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform

Beginner
Yuning Gong, Yifei Liu et al.Dec 9arXiv

Visionary is a web-based platform that lets you view and interact with advanced 3D scenes, right in your browser, with just a click.

#WebGPU#3D Gaussian Splatting#ONNX Runtime Web

Towards a Science of Scaling Agent Systems

Beginner
Yubin Kim, Ken Gu et al.Dec 9arXiv

Multi-agent AI teams are not automatically better; their success depends on matching the team’s coordination style to the job’s structure.

#multi-agent systems#agentic evaluation#scaling laws

UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation

Beginner
Jiehui Huang, Yuechen Zhang et al.Dec 8arXiv

UnityVideo is a single, unified model that learns from many kinds of video information at once—like colors (RGB), depth, motion (optical flow), body pose, skeletons, and segmentation—to make smarter, more realistic videos.

#multimodal video generation#multi-task learning#dynamic noise scheduling

OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory

Beginner
Zhaochong An, Menglin Jia et al.Dec 8arXiv

OneStory is a new way to make long videos from many shots that stay consistent with the story, characters, and places across time.

#multi-shot video generation#adaptive memory#frame selection

On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models

Beginner
Charlie Zhang, Graham Neubig et al.Dec 8arXiv

The paper asks when reinforcement learning (RL) really makes language models better at reasoning beyond what they learned in pre-training.

#edge of competence#process-verified evaluation#process-level rewards
7891011