🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers791

AllBeginnerIntermediateAdvanced
All SourcesarXiv

ExpSeek: Self-Triggered Experience Seeking for Web Agents

Intermediate
Wenyuan Zhang, Xinghua Zhang et al.Jan 13arXiv

ExpSeek helps web-browsing AI agents ask for help exactly when they feel unsure, instead of stuffing them with tips at the very beginning.

#web agents#experience base#experience triplets

MoCha:End-to-End Video Character Replacement without Structural Guidance

Intermediate
Zhengbo Xu, Jie Ma et al.Jan 13arXiv

MoCha is a new AI that swaps a person in a video with a new character using only one mask on one frame and a few reference photos.

#video diffusion#character replacement#in-context learning

Your Group-Relative Advantage Is Biased

Intermediate
Fengkai Yang, Zherui Chen et al.Jan 13arXiv

Group-based reinforcement learning for reasoning (like GRPO) uses the group's average reward as a baseline, but that makes its 'advantage' estimates biased.

#Reinforcement Learning from Verifier Rewards#GRPO#GSPO

JudgeRLVR: Judge First, Generate Second for Efficient Reasoning

Intermediate
Jiangshan Duo, Hanyu Li et al.Jan 13arXiv

JudgeRLVR teaches a model to be a strict judge of answers before it learns to generate them, which trims bad ideas early.

#RLVR#judge-then-generate#discriminative supervision

YaPO: Learnable Sparse Activation Steering Vectors for Domain Adaptation

Intermediate
Abdelaziz Bounhar, Rania Hossam Elmohamady Elbadry et al.Jan 13arXiv

This paper introduces YaPO, a way to gently nudge a language model’s hidden thoughts so it behaves better without retraining it.

#Activation Steering#Sparse Autoencoder#Preference Optimization

RubricHub: A Comprehensive and Highly Discriminative Rubric Dataset via Automated Coarse-to-Fine Generation

Intermediate
Sunzhu Li, Jiale Zhao et al.Jan 13arXiv

RubricHub is a huge (about 110,000) collection of detailed grading guides (rubrics) for many kinds of questions like health, science, writing, and chat.

#RubricHub#coarse-to-fine rubric generation#multi-model aggregation

UM-Text: A Unified Multimodal Model for Image Understanding and Visual Text Editing

Intermediate
Lichen Ma, Xiaolong Fu et al.Jan 13arXiv

UM-Text is a single AI that understands both your words and your picture to add or change text in images so it looks like it truly belongs there.

#visual text editing#multimodal diffusion#Visual Language Model

SnapGen++: Unleashing Diffusion Transformers for Efficient High-Fidelity Image Generation on Edge Devices

Intermediate
Dongting Hu, Aarush Gupta et al.Jan 13arXiv

This paper shows how to make powerful image‑generating Transformers run fast on phones without needing the cloud.

#Diffusion Transformer#Sparse Attention#Adaptive Sparse Self-Attention

User-Oriented Multi-Turn Dialogue Generation with Tool Use at scale

Intermediate
Jungho Cho, Minbyul Jeong et al.Jan 13arXiv

The paper builds a new way to create realistic, long conversations between people and AI that use tools like databases.

#multi-turn dialogue generation#tool use#user simulation

MemoBrain: Executive Memory as an Agentic Brain for Reasoning

Intermediate
Hongjin Qian, Zhao Cao et al.Jan 12arXiv

MemoBrain is like a helpful co-pilot for AI that keeps important thoughts neat and ready so the main thinker (the agent) doesn’t get overwhelmed.

#Executive memory#Tool-augmented agents#Context budget

MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head

Intermediate
Kewei Zhang, Ye Huang et al.Jan 12arXiv

Transformers are powerful but slow because regular self-attention compares every token with every other token, which grows too fast for long sequences.

#Multi-Head Linear Attention#Linear Attention#Self-Attention

More Images, More Problems? A Controlled Analysis of VLM Failure Modes

Intermediate
Anurag Das, Adrian Bulat et al.Jan 12arXiv

Large Vision-Language Models (LVLMs) look great on single images but often stumble when they must reason across multiple images.

#Large Vision-Language Models#Multi-image reasoning#Cross-image aggregation
3031323334