🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers1262

AllBeginnerIntermediateAdvanced
All SourcesarXiv

Is There a Better Source Distribution than Gaussian? Exploring Source Distributions for Image Flow Matching

Intermediate
Junho Lee, Kwanseok Kim et al.Dec 20arXiv

Flow Matching is like teaching arrows to push points from a simple cloud (source) to real pictures (target); most people start from a Gaussian cloud because it points equally in all directions.

#flow matching#conditional flow matching#source distribution

Not triaged yet

SAM Audio: Segment Anything in Audio

Intermediate
Bowen Shi, Andros Tjandra et al.Dec 19arXiv

SAM Audio is a new AI that can pull out exactly the sound you want from a noisy mix using text, clicks on a video, and time ranges—together or separately.

#audio source separation#multimodal prompting#text-guided separation

Not triaged yet

Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing

Beginner
Shilong Zhang, He Zhang et al.Dec 19arXiv

This paper shows that great image understanding features alone are not enough for making great images; you also need strong pixel-level detail.

#Pixel–Semantic VAE#Semantic Regularization#Off-Manifold Generation

Not triaged yet

When Reasoning Meets Its Laws

Intermediate
Junyu Zhang, Yifan Sun et al.Dec 19arXiv

The paper proposes the Laws of Reasoning (LORE), simple rules that say how much a model should think and how accurate it can be as problems get harder.

#Large Reasoning Models#Laws of Reasoning#Compute Law

Not triaged yet

RadarGen: Automotive Radar Point Cloud Generation from Cameras

Intermediate
Tomer Borreda, Fangqiang Ding et al.Dec 19arXiv

RadarGen is a tool that learns to generate realistic car radar point clouds just from multiple camera views.

#automotive radar#radar point cloud generation#latent diffusion

Not triaged yet

Region-Constraint In-Context Generation for Instructional Video Editing

Intermediate
Zhongwei Zhang, Fuchen Long et al.Dec 19arXiv

ReCo is a new way to edit videos just by telling the computer what to change with words, no extra masks needed.

#instruction-based video editing#in-context generation#region constraint

Not triaged yet

Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding

Beginner
Jiaqi Tang, Jianmin Chen et al.Dec 19arXiv

Robust-R1 teaches vision-language models to notice how a picture is damaged, think through what that damage hides, and then answer as if the picture were clear.

#Robust-R1#degradation-aware reasoning#multimodal large language models

Not triaged yet

InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion

Intermediate
Hoiyeong Jin, Hyojin Jang et al.Dec 19arXiv

InsertAnywhere is a two-stage system that lets you add a new object into any video so it looks like it was always there.

#video object insertion#4D scene geometry#diffusion video generation

Not triaged yet

GroundingME: Exposing the Visual Grounding Gap in MLLMs through Multi-Dimensional Evaluation

Intermediate
Rang Li, Lei Li et al.Dec 19arXiv

Visual grounding is when an AI finds the exact thing in a picture that a sentence is talking about, and this paper shows today’s big vision-language AIs are not as good at it as we thought.

#visual grounding#multimodal large language models#benchmark

Not triaged yet

3D-RE-GEN: 3D Reconstruction of Indoor Scenes with a Generative Framework

Intermediate
Tobias Sautter, Jan-Niklas Dihlmann et al.Dec 19arXiv

3D-RE-GEN turns a single photo of a room into a full 3D scene with separate, textured objects and a usable background.

#single-image 3D reconstruction#scene composition#context-aware inpainting

Not triaged yet

UCoder: Unsupervised Code Generation by Internal Probing of Large Language Models

Intermediate
Jiajun Wu, Jian Yang et al.Dec 19arXiv

The paper introduces UCoder, a way to teach a code-generating AI to get better without using any outside datasets, not even unlabeled code.

#unsupervised code generation#self-training#internal probing

Not triaged yet

Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers

Intermediate
Zeyuan Allen-ZhuDec 19arXiv

The paper introduces Canon layers, tiny add-ons that let nearby words share information directly, like passing notes along a row of desks.

#Canon layers#horizontal information flow#transformer architecture

Not triaged yet

8485868788