Papers1055

All Beginner Intermediate Advanced

All Sources arXiv

Does Your Reasoning Model Implicitly Know When to Stop Thinking?

Intermediate

Zixuan Huang, Xin Xia et al.Feb 9arXiv

Big AI reasoning models often keep thinking long after they already found the right answer, wasting time and tokens.

#SAGE#efficient reasoning#chain of thought

G-LNS: Generative Large Neighborhood Search for LLM-Based Automatic Heuristic Design

Intermediate

Baoyun Zhao, He Wang et al.Feb 9arXiv

This paper teaches an AI to invent its own 'break-and-fix' strategies (called LNS operators) for tough puzzles like delivery routes and city tours.

#Generative LNS#Automated Heuristic Design#Large Language Models

When and How Much to Imagine: Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning

Intermediate

Shoubin Yu, Yue Zhang et al.Feb 9arXiv

Visual spatial reasoning often fails when a model only looks at one picture and must imagine new viewpoints.

#Adaptive Test-Time Scaling#World Models#Visual Spatial Reasoning

Weak-Driven Learning: How Weak Agents make Strong Agents Stronger

Intermediate

Zehao Chen, Gongxun Li et al.Feb 9arXiv

Big language models can get stuck after fine-tuning because they become too sure of themselves, so normal training stops helping.

#weak-driven learning#logit mixing#weak agents

Dreaming in Code for Curriculum Learning in Open-Ended Worlds

Intermediate

Konstantinos Mitsides, Maxence Faldor et al.Feb 9arXiv

Agents in vast, open-ended games often learn a little and then get stuck because the next good practice steps are missing.

#open-ended learning#unsupervised environment design#curriculum learning

VidVec: Unlocking Video MLLM Embeddings for Video-Text Retrieval

Intermediate

Issar Tzachor, Dvir Samuel et al.Feb 8arXiv

VidVec shows that video-capable multimodal language models already hide strong matching signals between videos and sentences inside their middle layers.

#video–text retrieval#multimodal large language models#intermediate layer embeddings

Free(): Learning to Forget in Malloc-Only Reasoning Models

Intermediate

Yilun Zheng, Dongyang Ma et al.Feb 8arXiv

LLMs can think for many steps, but when they keep every step forever, the extra tokens turn into noise and make answers worse, not better.

#Free()LM#self-forgetting#context pruning

MIND: Benchmarking Memory Consistency and Action Control in World Models

Intermediate

Yixuan Ye, Xuanyu Lu et al.Feb 8arXiv

MIND is a new benchmark that fairly tests two core skills of world models: remembering the world over time (memory consistency) and following controls exactly (action control).

#world models#memory consistency#action control

LOCA-bench: Benchmarking Language Agents Under Controllable and Extreme Context Growth

Intermediate

Weihao Zeng, Yuzhen Huang et al.Feb 8arXiv

LOCA-bench is a test that challenges AI agents to work correctly as their to-do list and background information grow very, very long.

#LOCA-bench#long-context agents#context rot

Bielik Guard: Efficient Polish Language Safety Classifiers for LLM Content Moderation

Intermediate

Krzysztof Wróbel, Jan Maria Kowalski et al.Feb 8arXiv

Bielik Guard is a pair of small but strong Polish language safety models that check text for five kinds of risky content: hate/aggression, vulgar language, sexual content, crime, and self-harm.

#Polish NLP#content moderation#safety classifier

Rethinking the Value of Agent-Generated Tests for LLM-Based Software Engineering Agents

Intermediate

Zhi Chen, Zhensu Sun et al.Feb 8arXiv

This paper asks a simple question: do tests written by AI coding agents actually help them fix real software bugs, or do they just look helpful?

#LLM agents#agent-written tests#software engineering agents

Geometry-Aware Rotary Position Embedding for Consistent Video World Model

Intermediate

Chendong Xiang, Jiajun Liu et al.Feb 8arXiv

The paper fixes a common problem in video world models: scenes slowly change or “drift” when the camera moves and comes back.

#ViewRope#geometry-aware attention#rotary position embedding

18 19 20 21 22