🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers105

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#reinforcement learning

Skyra: AI-Generated Video Detection via Grounded Artifact Reasoning

Intermediate
Yifei Li, Wenzhao Zheng et al.Dec 17arXiv

Skyra is a detective-style AI that spots tiny visual mistakes (artifacts) in videos to tell if they are real or AI-generated, and it explains its decision with times and places in the video.

#AI-generated video detection#artifact reasoning#multimodal large language model

Puzzle Curriculum GRPO for Vision-Centric Reasoning

Intermediate
Ahmadreza Jeddi, Hakki Can Karaimer et al.Dec 16arXiv

This paper teaches vision-language models to reason about pictures using puzzles instead of expensive human labels.

#vision-language models#reinforcement learning#group-relative policy optimization

CRISP: Contact-Guided Real2Sim from Monocular Video with Planar Scene Primitives

Intermediate
Zihan Wang, Jiashun Wang et al.Dec 16arXiv

CRISP turns a normal phone video of a person into a clean 3D world and a virtual human that can move in it without breaking physics.

#real-to-sim#human-scene interaction#planar primitives

EVOLVE-VLA: Test-Time Training from Environment Feedback for Vision-Language-Action Models

Intermediate
Zechen Bai, Chen Gao et al.Dec 16arXiv

Robots usually learn by copying many demonstrations, which is expensive and makes them brittle when things change.

#EVOLVE-VLA#test-time training#vision-language-action

SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning

Intermediate
Jitesh Jain, Jialuo Li et al.Dec 15arXiv

SAGE is a smart video-watching agent that decides when to answer quickly and when to take multiple steps, just like how people skim or rewind long videos.

#any-horizon reasoning#video agents#temporal grounding

ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement

Intermediate
Zhihang Liu, Xiaoyi Bao et al.Dec 15arXiv

ShowTable is a new way for AI to turn a data table into a beautiful, accurate infographic using a think–make–check–fix loop.

#creative table visualization#multimodal large language model#diffusion model

QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management

Intermediate
Weizhou Shen, Ziyi Yang et al.Dec 15arXiv

QwenLong-L1.5 is a training recipe that helps AI read and reason over very long documents by improving the data it learns from, the way it is trained, and how it remembers important stuff.

#long-context reasoning#reinforcement learning#GRPO

DentalGPT: Incentivizing Multimodal Complex Reasoning in Dentistry

Intermediate
Zhenyang Cai, Jiaming Zhang et al.Dec 12arXiv

DentalGPT is a special AI that looks at dental images and text together and explains what it sees like a junior dentist.

#DentalGPT#multimodal large language model#dentistry AI

Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving

Intermediate
Songyang Gao, Yuzhe Gu et al.Dec 11arXiv

This paper builds a math problem–solving agent, Intern-S1-MO, that thinks in multiple rounds and remembers proven mini-results called lemmas so it can solve very long, Olympiad-level problems.

#long-horizon reasoning#lemma-based memory#multi-agent reasoning

Achieving Olympia-Level Geometry Large Language Model Agent via Complexity Boosting Reinforcement Learning

Intermediate
Haiteng Zhao, Junhao Shen et al.Dec 11arXiv

This paper builds InternGeometry, a large language model agent that solves Olympiad-level geometry by talking to a math engine, remembering what worked, and trying smart new ideas.

#InternGeometry#geometry theorem proving#auxiliary constructions

MOA: Multi-Objective Alignment for Role-Playing Agents

Intermediate
Chonghua Liao, Ke Wang et al.Dec 10arXiv

Role-playing agents need to juggle several goals at once, like staying in character, following instructions, and using the right tone.

#multi-objective alignment#role-playing agents#reinforcement learning

MentraSuite: Post-Training Large Language Models for Mental Health Reasoning and Assessment

Intermediate
Mengxi Xiao, Kailai Yang et al.Dec 10arXiv

MentraSuite is a complete toolkit that teaches large language models (LLMs) to reason about mental health step by step, not just sound caring.

#mental health reasoning#LLM post-training#supervised fine-tuning
56789