🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers5

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#reasoning efficiency

From Scale to Speed: Adaptive Test-Time Scaling for Image Editing

Intermediate
Xiangyan Qu, Zhenlong Yuan et al.Feb 24arXiv

This paper speeds up and improves AI image editing by giving hard edits more attention and easy edits less, just like a smart coach.

#adaptive test-time scaling#image chain-of-thought#image editing

Latent Thoughts Tuning: Bridging Context and Reasoning with Fused Information in Latent Tokens

Intermediate
Weihao Liu, Dehai Min et al.Feb 10arXiv

The paper introduces LT-Tuning, a way for AI models to “think silently” using special hidden tokens instead of writing every step out loud.

#latent tokens#chain-of-thought#context-prediction fusion

MAXS: Meta-Adaptive Exploration with LLM Agents

Intermediate
Jian Zhang, Zhiyuan Wang et al.Jan 14arXiv

MAXS is a new way for AI agents to think a few steps ahead while using tools like search and code, so they make smarter choices.

#LLM agents#tool-augmented reasoning#lookahead

JudgeRLVR: Judge First, Generate Second for Efficient Reasoning

Intermediate
Jiangshan Duo, Hanyu Li et al.Jan 13arXiv

JudgeRLVR teaches a model to be a strict judge of answers before it learns to generate them, which trims bad ideas early.

#RLVR#judge-then-generate#discriminative supervision

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

Intermediate
Shih-Yang Liu, Xin Dong et al.Jan 8arXiv

When a model learns from many rewards at once, a popular method called GRPO can accidentally squash different reward mixes into the same learning signal, which confuses training.

#GDPO#GRPO#multi-reward reinforcement learning