🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers943

AllBeginnerIntermediateAdvanced
All SourcesarXiv

Boosting Latent Diffusion Models via Disentangled Representation Alignment

Intermediate
John Page, Xuesong Niu et al.Jan 9arXiv

This paper shows that the best VAEs for image generation are the ones whose latents neatly separate object attributes, a property called semantic disentanglement.

#Send-VAE#semantic disentanglement#latent diffusion

EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis

Intermediate
Xiaoshuai Song, Haofei Chang et al.Jan 9arXiv

EnvScaler is an automatic factory that builds many safe, rule-following practice worlds where AI agents can talk to users and call tools, just like real apps.

#EnvScaler#tool-interactive environments#programmatic synthesis

PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning

Intermediate
Jingcheng Hu, Yinmin Zhang et al.Jan 9arXiv

PaCoRe is a way for AI to think in many parallel paths and then coordinate them, so it can use a lot more brainpower at test time without running out of context window space.

#Parallel Coordinated Reasoning#Test-time compute scaling#Message passing

Orient Anything V2: Unifying Orientation and Rotation Understanding

Intermediate
Zehan Wang, Ziang Zhang et al.Jan 9arXiv

This paper teaches an AI model to understand both which way an object is facing (orientation) and how it turns between views (rotation), all in one system.

#object orientation#rotational symmetry#relative rotation

FinVault: Benchmarking Financial Agent Safety in Execution-Grounded Environments

Intermediate
Zhi Yang, Runguo Li et al.Jan 9arXiv

FinVault is a new test that checks if AI helpers for finance stay safe while actually doing real jobs, not just chatting.

#financial AI agents#execution-grounded benchmarking#sandboxed environments

Over-Searching in Search-Augmented Large Language Models

Intermediate
Roy Xie, Deepak Gopinath et al.Jan 9arXiv

The paper shows that language models with a search tool often look up too much information, which wastes compute and can make answers worse on unanswerable questions.

#search-augmented LLMs#over-searching#abstention

Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization

Beginner
Yuxiang Ji, Yong Wang et al.Jan 8arXiv

The paper teaches an AI to act like a careful traveler: it looks at a photo, forms guesses about where it might be, and uses real map tools to check each guess.

#image geolocalization#map-augmented agent#Thinking with Map

Same Claim, Different Judgment: Benchmarking Scenario-Induced Bias in Multilingual Financial Misinformation Detection

Beginner
Zhiwei Liu, Yupen Cao et al.Jan 8arXiv

This paper builds MFMD-Scen, a big test to see how AI changes its truth/false judgments about the same money-related claim when the situation around it changes.

#financial misinformation detection#scenario-induced bias#multilingual benchmark

RL-AWB: Deep Reinforcement Learning for Auto White Balance Correction in Low-Light Night-time Scenes

Beginner
Yuan-Kang Lee, Kuan-Lin Chen et al.Jan 8arXiv

This paper teaches a camera to fix nighttime colors by combining a smart rule-based color trick (SGP-LRD) with a learning-by-trying helper (reinforcement learning).

#auto white balance#color constancy#nighttime imaging

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

Intermediate
Shih-Yang Liu, Xin Dong et al.Jan 8arXiv

When a model learns from many rewards at once, a popular method called GRPO can accidentally squash different reward mixes into the same learning signal, which confuses training.

#GDPO#GRPO#multi-reward reinforcement learning

RoboVIP: Multi-View Video Generation with Visual Identity Prompting Augments Robot Manipulation

Intermediate
Boyang Wang, Haoran Zhang et al.Jan 8arXiv

RoboVIP is a plug-and-play tool that turns ordinary robot videos into many new, realistic, multi-view training videos without changing the original robot actions.

#robotic manipulation#video diffusion#multi-view generation

Plenoptic Video Generation

Intermediate
Xiao Fu, Shitao Tang et al.Jan 8arXiv

PlenopticDreamer is a new way to remake a video from different camera paths while keeping everything consistent across views and over time.

#plenoptic function#camera-controlled video generation#video re-rendering
4041424344