🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers924

AllBeginnerIntermediateAdvanced
All SourcesarXiv

Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization

Beginner
Yuxiang Ji, Yong Wang et al.Jan 8arXiv

The paper teaches an AI to act like a careful traveler: it looks at a photo, forms guesses about where it might be, and uses real map tools to check each guess.

#image geolocalization#map-augmented agent#Thinking with Map

Same Claim, Different Judgment: Benchmarking Scenario-Induced Bias in Multilingual Financial Misinformation Detection

Beginner
Zhiwei Liu, Yupen Cao et al.Jan 8arXiv

This paper builds MFMD-Scen, a big test to see how AI changes its truth/false judgments about the same money-related claim when the situation around it changes.

#financial misinformation detection#scenario-induced bias#multilingual benchmark

RL-AWB: Deep Reinforcement Learning for Auto White Balance Correction in Low-Light Night-time Scenes

Beginner
Yuan-Kang Lee, Kuan-Lin Chen et al.Jan 8arXiv

This paper teaches a camera to fix nighttime colors by combining a smart rule-based color trick (SGP-LRD) with a learning-by-trying helper (reinforcement learning).

#auto white balance#color constancy#nighttime imaging

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

Intermediate
Shih-Yang Liu, Xin Dong et al.Jan 8arXiv

When a model learns from many rewards at once, a popular method called GRPO can accidentally squash different reward mixes into the same learning signal, which confuses training.

#GDPO#GRPO#multi-reward reinforcement learning

RoboVIP: Multi-View Video Generation with Visual Identity Prompting Augments Robot Manipulation

Intermediate
Boyang Wang, Haoran Zhang et al.Jan 8arXiv

RoboVIP is a plug-and-play tool that turns ordinary robot videos into many new, realistic, multi-view training videos without changing the original robot actions.

#robotic manipulation#video diffusion#multi-view generation

Plenoptic Video Generation

Intermediate
Xiao Fu, Shitao Tang et al.Jan 8arXiv

PlenopticDreamer is a new way to remake a video from different camera paths while keeping everything consistent across views and over time.

#plenoptic function#camera-controlled video generation#video re-rendering

VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice

Intermediate
Shuming Liu, Mingchen Zhuge et al.Jan 8arXiv

The paper asks a simple question: do video AIs really need to “think out loud” every time, or can they answer quickly most of the time and think deeply only when needed?

#video reasoning#adaptive reasoning#early exit

CoV: Chain-of-View Prompting for Spatial Reasoning

Intermediate
Haoyu Zhao, Akide Liu et al.Jan 8arXiv

This paper teaches AI to look around a 3D place step by step, instead of staring at a fixed set of pictures, so it can answer tricky spatial questions better.

#Chain-of-View Prompting#Embodied Question Answering#Active Viewpoint Reasoning

RelayLLM: Efficient Reasoning via Collaborative Decoding

Intermediate
Chengsong Huang, Tong Zheng et al.Jan 8arXiv

RelayLLM lets a small model do the talking and only asks a big model for help on a few, truly hard tokens.

#token-level collaboration#<call>n</call> command#collaborative decoding

DocDancer: Towards Agentic Document-Grounded Information Seeking

Intermediate
Qintong Zhang, Xinjie Lv et al.Jan 8arXiv

DocDancer is a smart document helper that answers questions by exploring and reading long, mixed-media PDFs using just two tools: Search and Read.

#Document Question Answering#Agentic Information Seeking#ReAct

VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control

Intermediate
Sixiao Zheng, Minghao Yin et al.Jan 8arXiv

VerseCrafter is a video world model that lets you steer both the camera and multiple moving objects by editing a single 4D world state.

#Video world model#4D Geometric Control#3D Gaussian trajectories

Re-Align: Structured Reasoning-guided Alignment for In-Context Image Generation and Editing

Beginner
Runze He, Yiji Cheng et al.Jan 8arXiv

Re-Align is a new way for AI to make and edit pictures by thinking in clear steps before drawing.

#In-Context Image Generation#Reference-based Image Editing#Structured Reasoning
3940414243