Papers1262

All Beginner Intermediate Advanced

All Sources arXiv

When and How Much to Imagine: Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning

Intermediate

Shoubin Yu, Yue Zhang et al.Feb 9arXiv

Visual spatial reasoning often fails when a model only looks at one picture and must imagine new viewpoints.

#Adaptive Test-Time Scaling#World Models#Visual Spatial Reasoning

Not triaged yet

Weak-Driven Learning: How Weak Agents make Strong Agents Stronger

Intermediate

Zehao Chen, Gongxun Li et al.Feb 9arXiv

Big language models can get stuck after fine-tuning because they become too sure of themselves, so normal training stops helping.

#weak-driven learning#logit mixing#weak agents

Not triaged yet

Dreaming in Code for Curriculum Learning in Open-Ended Worlds

Intermediate

Konstantinos Mitsides, Maxence Faldor et al.Feb 9arXiv

Agents in vast, open-ended games often learn a little and then get stuck because the next good practice steps are missing.

#open-ended learning#unsupervised environment design#curriculum learning

Not triaged yet

VidVec: Unlocking Video MLLM Embeddings for Video-Text Retrieval

Intermediate

Issar Tzachor, Dvir Samuel et al.Feb 8arXiv

VidVec shows that video-capable multimodal language models already hide strong matching signals between videos and sentences inside their middle layers.

#video–text retrieval#multimodal large language models#intermediate layer embeddings

Not triaged yet

Free(): Learning to Forget in Malloc-Only Reasoning Models

Intermediate

Yilun Zheng, Dongyang Ma et al.Feb 8arXiv

LLMs can think for many steps, but when they keep every step forever, the extra tokens turn into noise and make answers worse, not better.

#Free()LM#self-forgetting#context pruning

Not triaged yet

MIND: Benchmarking Memory Consistency and Action Control in World Models

Intermediate

Yixuan Ye, Xuanyu Lu et al.Feb 8arXiv

MIND is a new benchmark that fairly tests two core skills of world models: remembering the world over time (memory consistency) and following controls exactly (action control).

#world models#memory consistency#action control

Not triaged yet

LOCA-bench: Benchmarking Language Agents Under Controllable and Extreme Context Growth

Intermediate

Weihao Zeng, Yuzhen Huang et al.Feb 8arXiv

LOCA-bench is a test that challenges AI agents to work correctly as their to-do list and background information grow very, very long.

#LOCA-bench#long-context agents#context rot

Not triaged yet

Bielik Guard: Efficient Polish Language Safety Classifiers for LLM Content Moderation

Intermediate

Krzysztof Wróbel, Jan Maria Kowalski et al.Feb 8arXiv

Bielik Guard is a pair of small but strong Polish language safety models that check text for five kinds of risky content: hate/aggression, vulgar language, sexual content, crime, and self-harm.

#Polish NLP#content moderation#safety classifier

Not triaged yet

Rethinking the Value of Agent-Generated Tests for LLM-Based Software Engineering Agents

Intermediate

Zhi Chen, Zhensu Sun et al.Feb 8arXiv

This paper asks a simple question: do tests written by AI coding agents actually help them fix real software bugs, or do they just look helpful?

#LLM agents#agent-written tests#software engineering agents

Not triaged yet

Geometry-Aware Rotary Position Embedding for Consistent Video World Model

Intermediate

Chendong Xiang, Jiajun Liu et al.Feb 8arXiv

The paper fixes a common problem in video world models: scenes slowly change or “drift” when the camera moves and comes back.

#ViewRope#geometry-aware attention#rotary position embedding

Not triaged yet

Recurrent-Depth VLA: Implicit Test-Time Compute Scaling of Vision-Language-Action Models via Latent Iterative Reasoning

Intermediate

Yalcin Tur, Jalal Naghiyev et al.Feb 8arXiv

Robots often use the same amount of thinking for easy and hard moves, which wastes time on easy steps and isn’t enough for tricky ones.

#Recurrent depth#Latent iterative reasoning#Vision-Language-Action

Not triaged yet

Secure Code Generation via Online Reinforcement Learning with Vulnerability Reward Model

Intermediate

Tianyi Wu, Mingzhe Du et al.Feb 7arXiv

This paper introduces SecCoderX, a way to teach code-writing AIs to be secure without breaking what the code is supposed to do.

#secure code generation#reinforcement learning#vulnerability reward model

Not triaged yet

24 25 26 27 28