🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers130

AllBeginnerIntermediateAdvanced
All SourcesarXiv

Image Diffusion Preview with Consistency Solver

Beginner
Fu-Yun Wang, Hao Zhou et al.Dec 15arXiv

Diffusion Preview is a two-step “preview-then-refine” workflow that shows you a fast draft image first and only spends full compute after you like the draft.

#diffusion preview#consistency solver#pf-ode

Reveal Hidden Pitfalls and Navigate Next Generation of Vector Similarity Search from Task-Centric Views

Beginner
Tingyang Chen, Cong Fu et al.Dec 15arXiv

The paper shows that judging vector search only by distance-based recall and speed can be very misleading for real tasks.

#vector similarity search#approximate nearest neighbor#maximum inner product search

Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling

Beginner
Yuran Wang, Bohan Zeng et al.Dec 14arXiv

Scone is a new AI method that makes images from instructions while correctly picking the right subject even when many look similar.

#subject-driven image generation#multi-subject composition#subject distinction

The FACTS Leaderboard: A Comprehensive Benchmark for Large Language Model Factuality

Beginner
Aileen Cheng, Alon Jacovi et al.Dec 11arXiv

The FACTS Leaderboard is a four-part test that checks how truthful AI models are across images, memory, web search, and document grounding.

#LLM factuality#benchmarking#multimodal evaluation

Sharp Monocular View Synthesis in Less Than a Second

Beginner
Lars Mescheder, Wei Dong et al.Dec 11arXiv

SHARP turns a single photo into a 3D scene you can look around in, and it does this in under one second on a single GPU.

#monocular view synthesis#3D Gaussians#real-time neural rendering

Confucius Code Agent: Scalable Agent Scaffolding for Real-World Codebases

Beginner
Sherman Wong, Zhenting Qi et al.Dec 11arXiv

This paper introduces the Confucius Code Agent (CCA), a coding helper built to handle huge real-world codebases with long tasks and many tools.

#coding agents#agent scaffolding#context management

MotionEdit: Benchmarking and Learning Motion-Centric Image Editing

Beginner
Yixin Wan, Lei Ke et al.Dec 11arXiv

This paper creates MotionEdit, a high-quality dataset that teaches AI to change how people and objects move in a picture without breaking their looks or the scene.

#motion-centric image editing#optical flow#MotionEdit dataset

VABench: A Comprehensive Benchmark for Audio-Video Generation

Beginner
Daili Hua, Xizhi Wang et al.Dec 10arXiv

VABench is a new, all-in-one test that checks how well AI makes videos with matching sound and pictures.

#audio-video benchmark#synchronization#lip-sync

Visionary: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform

Beginner
Yuning Gong, Yifei Liu et al.Dec 9arXiv

Visionary is a web-based platform that lets you view and interact with advanced 3D scenes, right in your browser, with just a click.

#WebGPU#3D Gaussian Splatting#ONNX Runtime Web

Towards a Science of Scaling Agent Systems

Beginner
Yubin Kim, Ken Gu et al.Dec 9arXiv

Multi-agent AI teams are not automatically better; their success depends on matching the team’s coordination style to the job’s structure.

#multi-agent systems#agentic evaluation#scaling laws

UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation

Beginner
Jiehui Huang, Yuechen Zhang et al.Dec 8arXiv

UnityVideo is a single, unified model that learns from many kinds of video information at once—like colors (RGB), depth, motion (optical flow), body pose, skeletons, and segmentation—to make smarter, more realistic videos.

#multimodal video generation#multi-task learning#dynamic noise scheduling

OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory

Beginner
Zhaochong An, Menglin Jia et al.Dec 8arXiv

OneStory is a new way to make long videos from many shots that stay consistent with the story, characters, and places across time.

#multi-shot video generation#adaptive memory#frame selection
7891011