Papers791

WebGym: Scaling Training Environments for Visual Web Agents with Realistic Tasks

Hao Bai, Alexey Taymanov et al.Jan 5arXiv

WebGym is a giant practice world (almost 300,000 tasks) that lets AI web agents learn on real, ever-changing websites instead of tiny, fake ones.

#WebGym#visual web agents#vision-language models

CogFlow: Bridging Perception and Reasoning through Knowledge Internalization for Visual Mathematical Problem Solving

Intermediate

Shuhang Chen, Yunqiu Xu et al.Jan 5arXiv

This paper teaches AI to solve diagram-based math problems by copying how people think: first see (perception), then make sense of what you saw (internalization), and finally reason (solve the problem).

#visual mathematical reasoning#multimodal large language models#perception-reasoning alignment

COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs

Intermediate

Dasol Choi, DongGeon Lee et al.Jan 5arXiv

COMPASS is a new framework that turns a company’s rules into thousands of smart test questions to check if chatbots follow those rules.

#policy alignment#allowlist denylist#enterprise AI safety

K-EXAONE Technical Report

Intermediate

Eunbi Choi, Kibong Choi et al.Jan 5arXiv

K-EXAONE is a super-sized language model that speaks six languages and can read very long documents (up to 256,000 tokens) without forgetting important details.

#Mixture-of-Experts#Hybrid Attention#Sliding Window Attention

FFP-300K: Scaling First-Frame Propagation for Generalizable Video Editing

Intermediate

Xijie Huang, Chengming Xu et al.Jan 5arXiv

This paper makes video editing easier by teaching an AI to spread changes from the first frame across the whole video smoothly and accurately.

#First-Frame Propagation#Video Editing#FFP-300K

NitroGen: An Open Foundation Model for Generalist Gaming Agents

Intermediate

Loïc Magne, Anas Awadalla et al.Jan 4arXiv

NitroGen is a vision-to-action AI that learns to play many video games by watching 40,000 hours of gameplay videos from over 1,000 titles with on-screen controller overlays.

#NitroGen#generalist gaming agent#behavior cloning

OpenNovelty: An LLM-powered Agentic System for Verifiable Scholarly Novelty Assessment

Intermediate

Ming Zhang, Kexin Tan et al.Jan 4arXiv

OpenNovelty is a four-phase, AI-powered helper that checks how new a research paper’s ideas are by comparing them to real, retrieved papers.

#novelty assessment#peer review#LLM agentic system

DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving

Intermediate

Yang Zhou, Hao Shao et al.Jan 4arXiv

DrivingGen is a new, all-in-one test that fairly checks how well AI can imagine future driving videos and motions.

#generative video#autonomous driving#world models

SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving

Intermediate

Chaofan Tao, Jierun Chen et al.Jan 4arXiv

SWE-Lego shows that a simple training method called supervised fine-tuning (SFT), when done carefully, can teach AI to fix real software bugs very well.

#SWE-Lego#Supervised Fine-Tuning#Error Masking

DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer

Intermediate

Xu Guo, Fulong Ye et al.Jan 4arXiv

DreamID-V is a new AI method that swaps faces in videos while keeping the body movements, expressions, lighting, and background steady and natural.

#video face swapping#image face swapping#diffusion transformer

Digital Twin AI: Opportunities and Challenges from Large Language Models to World Models

Intermediate

Rong Zhou, Dongping Chen et al.Jan 4arXiv

A digital twin is a living computer copy of a real thing (like a bridge, a heart, or a factory) that stays in sync with sensors and helps us predict, fix, and improve the real thing.

#digital twin#physics-informed AI#neural operators

Flow Equivariant World Models: Memory for Partially Observed Dynamic Environments

Intermediate

Hansen Jin Lillemark, Benhao Huang et al.Jan 3arXiv

This paper shows how to give AI a steady “mental map” of the world that keeps updating even when the camera looks away.

#flow equivariance#world model#partially observed environments

38 39 40 41 42