🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers105

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#reinforcement learning

WebGym: Scaling Training Environments for Visual Web Agents with Realistic Tasks

Intermediate
Hao Bai, Alexey Taymanov et al.Jan 5arXiv

WebGym is a giant practice world (almost 300,000 tasks) that lets AI web agents learn on real, ever-changing websites instead of tiny, fake ones.

#WebGym#visual web agents#vision-language models

CogFlow: Bridging Perception and Reasoning through Knowledge Internalization for Visual Mathematical Problem Solving

Intermediate
Shuhang Chen, Yunqiu Xu et al.Jan 5arXiv

This paper teaches AI to solve diagram-based math problems by copying how people think: first see (perception), then make sense of what you saw (internalization), and finally reason (solve the problem).

#visual mathematical reasoning#multimodal large language models#perception-reasoning alignment

DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer

Intermediate
Xu Guo, Fulong Ye et al.Jan 4arXiv

DreamID-V is a new AI method that swaps faces in videos while keeping the body movements, expressions, lighting, and background steady and natural.

#video face swapping#image face swapping#diffusion transformer

Scaling Open-Ended Reasoning to Predict the Future

Intermediate
Nikhil Chandak, Shashwat Goel et al.Dec 31arXiv

The paper teaches small language models to predict open-ended future events by turning daily news into thousands of safe, graded practice questions.

#open-ended forecasting#calibrated prediction#Brier score

Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem

Intermediate
Weixun Wang, XiaoXiao Xu et al.Dec 31arXiv

This paper builds an open, end-to-end ecosystem (ALE) that lets AI agents plan, act, and fix their own mistakes across many steps in real computer environments.

#agentic LLMs#reinforcement learning#IPA

Dream2Flow: Bridging Video Generation and Open-World Manipulation with 3D Object Flow

Intermediate
Karthik Dharmarajan, Wenlong Huang et al.Dec 31arXiv

Dream2Flow lets a robot watch a short, AI-generated video of a task and then do that task in real life by following object motion in 3D.

#3D object flow#video generation for robotics#open-world manipulation

Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models

Intermediate
Junru Lu, Jiarui Qin et al.Dec 31arXiv

Youtu-LLM is a small (1.96B) language model that was trained from scratch to think, plan, and act like an agent instead of just copying bigger models.

#lightweight LLM#agentic mid-training#trajectory data

Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization

Intermediate
Yuchen Shi, Yuzheng Cai et al.Dec 31arXiv

Youtu-Agent is a build-and-grow factory for AI agents that cuts manual setup and keeps agents improving over time.

#LLM agents#automated agent generation#modular architecture

SenseNova-MARS: Empowering Multimodal Agentic Reasoning and Search via Reinforcement Learning

Intermediate
Yong Xien Chng, Tao Hu et al.Dec 30arXiv

SenseNova-MARS is a vision-language model that can think step-by-step and use three tools—text search, image search, and image cropping—during its reasoning.

#multimodal agent#vision-language model#reinforcement learning

Taming Hallucinations: Boosting MLLMs' Video Understanding via Counterfactual Video Generation

Intermediate
Zhe Huang, Hao Wen et al.Dec 30arXiv

Multimodal Large Language Models (MLLMs) often hallucinate on videos by trusting words and common sense more than what the frames really show.

#multimodal large language model#video understanding#visual hallucination

Training AI Co-Scientists Using Rubric Rewards

Intermediate
Shashwat Goel, Rishi Hazra et al.Dec 29arXiv

The paper teaches AI to write strong research plans by letting it grade its own work using checklists (rubrics) pulled from real scientific papers.

#AI co-scientist#research plan generation#rubric rewards

ProGuard: Towards Proactive Multimodal Safeguard

Intermediate
Shaohan Yu, Lijun Li et al.Dec 29arXiv

ProGuard is a safety guard for text and images that doesn’t just spot known problems—it can also recognize and name new, never-seen-before risks.

#proactive safety#multimodal moderation#out-of-distribution detection
45678