🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers791

AllBeginnerIntermediateAdvanced
All SourcesarXiv

WebGym: Scaling Training Environments for Visual Web Agents with Realistic Tasks

Intermediate
Hao Bai, Alexey Taymanov et al.Jan 5arXiv

WebGym is a giant practice world (almost 300,000 tasks) that lets AI web agents learn on real, ever-changing websites instead of tiny, fake ones.

#WebGym#visual web agents#vision-language models

CogFlow: Bridging Perception and Reasoning through Knowledge Internalization for Visual Mathematical Problem Solving

Intermediate
Shuhang Chen, Yunqiu Xu et al.Jan 5arXiv

This paper teaches AI to solve diagram-based math problems by copying how people think: first see (perception), then make sense of what you saw (internalization), and finally reason (solve the problem).

#visual mathematical reasoning#multimodal large language models#perception-reasoning alignment

COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs

Intermediate
Dasol Choi, DongGeon Lee et al.Jan 5arXiv

COMPASS is a new framework that turns a company’s rules into thousands of smart test questions to check if chatbots follow those rules.

#policy alignment#allowlist denylist#enterprise AI safety

K-EXAONE Technical Report

Intermediate
Eunbi Choi, Kibong Choi et al.Jan 5arXiv

K-EXAONE is a super-sized language model that speaks six languages and can read very long documents (up to 256,000 tokens) without forgetting important details.

#Mixture-of-Experts#Hybrid Attention#Sliding Window Attention

FFP-300K: Scaling First-Frame Propagation for Generalizable Video Editing

Intermediate
Xijie Huang, Chengming Xu et al.Jan 5arXiv

This paper makes video editing easier by teaching an AI to spread changes from the first frame across the whole video smoothly and accurately.

#First-Frame Propagation#Video Editing#FFP-300K

NitroGen: An Open Foundation Model for Generalist Gaming Agents

Intermediate
Loïc Magne, Anas Awadalla et al.Jan 4arXiv

NitroGen is a vision-to-action AI that learns to play many video games by watching 40,000 hours of gameplay videos from over 1,000 titles with on-screen controller overlays.

#NitroGen#generalist gaming agent#behavior cloning

OpenNovelty: An LLM-powered Agentic System for Verifiable Scholarly Novelty Assessment

Intermediate
Ming Zhang, Kexin Tan et al.Jan 4arXiv

OpenNovelty is a four-phase, AI-powered helper that checks how new a research paper’s ideas are by comparing them to real, retrieved papers.

#novelty assessment#peer review#LLM agentic system

DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving

Intermediate
Yang Zhou, Hao Shao et al.Jan 4arXiv

DrivingGen is a new, all-in-one test that fairly checks how well AI can imagine future driving videos and motions.

#generative video#autonomous driving#world models

SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving

Intermediate
Chaofan Tao, Jierun Chen et al.Jan 4arXiv

SWE-Lego shows that a simple training method called supervised fine-tuning (SFT), when done carefully, can teach AI to fix real software bugs very well.

#SWE-Lego#Supervised Fine-Tuning#Error Masking

DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer

Intermediate
Xu Guo, Fulong Ye et al.Jan 4arXiv

DreamID-V is a new AI method that swaps faces in videos while keeping the body movements, expressions, lighting, and background steady and natural.

#video face swapping#image face swapping#diffusion transformer

Digital Twin AI: Opportunities and Challenges from Large Language Models to World Models

Intermediate
Rong Zhou, Dongping Chen et al.Jan 4arXiv

A digital twin is a living computer copy of a real thing (like a bridge, a heart, or a factory) that stays in sync with sensors and helps us predict, fix, and improve the real thing.

#digital twin#physics-informed AI#neural operators

Flow Equivariant World Models: Memory for Partially Observed Dynamic Environments

Intermediate
Hansen Jin Lillemark, Benhao Huang et al.Jan 3arXiv

This paper shows how to give AI a steady “mental map” of the world that keeps updating even when the camera looks away.

#flow equivariance#world model#partially observed environments
3839404142