Papers1055

WorldVQA: Measuring Atomic World Knowledge in Multimodal Large Language Models

Runjie Zhou, Youbo Shao et al.Jan 28arXiv

WorldVQA is a new test that checks if multimodal AI models can correctly name what they see in pictures without doing extra reasoning.

#WorldVQA#atomic visual knowledge#multimodal large language models

Efficient Autoregressive Video Diffusion with Dummy Head

Intermediate

Hang Guo, Zhaoyang Jia et al.Jan 28arXiv

This paper finds that about 1 out of every 4 attention heads in autoregressive video diffusion models mostly looks only at the current frame and almost ignores the past, wasting memory and time.

#autoregressive video diffusion#multi-head self-attention#KV cache compression

OmegaUse: Building a General-Purpose GUI Agent for Autonomous Task Execution

Intermediate

Le Zhang, Yixiong Xiao et al.Jan 28arXiv

OmegaUse is a new AI that can use phones and computers by looking at screenshots and deciding where to click, type, or scroll—much like a careful human user.

#GUI agent#UI grounding#navigation policy

DenseGRPO: From Sparse to Dense Reward for Flow Matching Model Alignment

Intermediate

Haoyou Deng, Keyu Yan et al.Jan 28arXiv

DenseGRPO teaches image models using lots of small, timely rewards instead of one final score at the end.

#DenseGRPO#flow matching#GRPO

Spark: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic Learning

Intermediate

Jinyang Wu, Shuo Yang et al.Jan 28arXiv

SPARK is a new way to train AI agents that saves compute by exploring more only at the most important moments.

#SPARK#dynamic branching#strategic exploration

VERGE: Formal Refinement and Guidance Engine for Verifiable LLM Reasoning

Intermediate

Vikash Singh, Darion Cassel et al.Jan 27arXiv

VERGE is a teamwork system where an AI writer (an LLM) works with a strict math checker (an SMT solver) to make answers both smart and logically sound.

#VERGE#neurosymbolic reasoning#SMT solver

Self-Distillation Enables Continual Learning

Intermediate

Idan Shenfeld, Mehul Damani et al.Jan 27arXiv

This paper shows a simple way for AI models to keep learning new things without forgetting what they already know.

#Self-Distillation Fine-Tuning#On-Policy Distillation#Continual Learning

Post-LayerNorm Is Back: Stable, ExpressivE, and Deep

Intermediate

Chen Chen, Lai WeiJan 27arXiv

Big AI models used to get better by getting wider or reading longer texts, but those tricks are slowing down.

#Keel#Post-LayerNorm#Pre-LayerNorm

Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models

Intermediate

Jialong Wu, Xiaoying Zhang et al.Jan 27arXiv

The paper argues that making and using pictures inside an AI’s thinking can help it reason more like humans, especially for real-world, physical and spatial problems.

#visual world modeling#multimodal chain-of-thought#unified multimodal models

Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision

Intermediate

Zhixiang Wei, Yi Li et al.Jan 27arXiv

Youtu-VL is a new kind of vision-language model that learns to predict both words and tiny image pieces, not just words.

#Vision-Language Models#Unified Autoregressive Supervision#Visual Tokenization

AACR-Bench: Evaluating Automatic Code Review with Holistic Repository-Level Context

Intermediate

Lei Zhang, Yongda Yu et al.Jan 27arXiv

AACR-Bench is a new test set that checks how well AI can do code reviews using the whole project, not just one file.

#Automated Code Review#Benchmark#Repository-level Context

Selective Steering: Norm-Preserving Control Through Discriminative Layer Selection

Intermediate

Quy-Anh Dang, Chris NgoJan 27arXiv

Selective Steering is a new way to gently nudge a language model’s inner thoughts without breaking its flow or skills.

#Selective Steering#Activation Steering#Angular Steering

37 38 39 40 41