Papers1262

Controlled Self-Evolution for Algorithmic Code Optimization

The paper introduces Controlled Self-Evolution (CSE), a smarter way for AI to write and improve code quickly under a tight budget of tries.

#Controlled Self-Evolution#Code optimization#Self-evolving agents

Not triaged yet

VideoLoom: A Video Large Language Model for Joint Spatial-Temporal Understanding

Intermediate

Jiapeng Shi, Junke Wang et al.Jan 12arXiv

VideoLoom is a single AI model that can tell both when something happens in a video and where it happens, at the pixel level.

#Video Large Language Model#Temporal Grounding#Referring Video Object Segmentation

Not triaged yet

Focal Guidance: Unlocking Controllability from Semantic-Weak Layers in Video Diffusion Models

Intermediate

Yuanyang Yin, Yufan Deng et al.Jan 12arXiv

Image-to-Video models often keep the picture looking right but ignore parts of the text instructions.

#Image-to-Video generation#Diffusion Transformer#Controllability

Not triaged yet

The Confidence Dichotomy: Analyzing and Mitigating Miscalibration in Tool-Use Agents

Beginner

Weihao Xuan, Qingcheng Zeng et al.Jan 12arXiv

This paper studies how AI agents that use tools talk about how sure they are and finds a split: some tools make them too sure, others help them be honest.

#LLM agents#calibration#overconfidence

Not triaged yet

MeepleLM: A Virtual Playtester Simulating Diverse Subjective Experiences

Intermediate

Zizhen Li, Chuanhao Li et al.Jan 12arXiv

MeepleLM is a special AI that reads a board game’s rulebook and pretends to be different kinds of players to give helpful, honest feedback.

#virtual playtesting#persona-aligned critique#MDA reasoning

Not triaged yet

Lost in the Noise: How Reasoning Models Fail with Contextual Distractors

Intermediate

Seongyun Lee, Yongrae Jo et al.Jan 12arXiv

The paper shows that when we give AI lots of extra text, even harmless extra text, it can get badly confused—sometimes losing up to 80% of its accuracy.

#NoisyBench#Rationale-Aware Reward#RARE

Not triaged yet

Dr. Zero: Self-Evolving Search Agents without Training Data

Intermediate

Zhenrui Yue, Kartikeya Upasani et al.Jan 11arXiv

Dr. Zero is a pair of AI agents (a Proposer and a Solver) that teach each other to do web-search-based reasoning without any human-written training data.

#Dr. Zero#self-evolution#proposer-solver

Not triaged yet

Solar Open Technical Report

Intermediate

Sungrae Park, Sanghoon Kim et al.Jan 11arXiv

Solar Open is a giant bilingual AI (102 billion parameters) that focuses on helping underserved languages like Korean catch up with English-level AI quality.

#Solar Open#Mixture-of-Experts#bilingual LLM

Not triaged yet

RealMem: Benchmarking LLMs in Real-World Memory-Driven Interaction

Beginner

Haonan Bian, Zhiyuan Yao et al.Jan 11arXiv

RealMem is a new benchmark that tests how well AI assistants remember and manage long, ongoing projects across many conversations.

#RealMem#long-term memory#project-oriented interactions

Not triaged yet

X-Coder: Advancing Competitive Programming with Fully Synthetic Tasks, Solutions, and Tests

Intermediate

Jie Wu, Haoling Li et al.Jan 11arXiv

X-Coder shows that models can learn expert-level competitive programming using data that is 100% synthetic—no real contest problems needed.

#competitive programming#synthetic data generation#feature-based synthesis

Not triaged yet

Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning

Intermediate

Chengwen Liu, Xiaomin Yu et al.Jan 11arXiv

VideoDR is a new benchmark that tests if AI can watch a video, pull out key visual clues, search the open web, and chain the clues together to find one verifiable answer.

#video deep research#multimodal reasoning#open-domain question answering

Not triaged yet

ET-Agent: Incentivizing Effective Tool-Integrated Reasoning Agent via Behavior Calibration

Intermediate

Yifei Chen, Guanting Dong et al.Jan 11arXiv

ET-Agent is a training framework that teaches AI agents to use tools (like search and code) more wisely, not just to get the right answer.

#Tool-Integrated Reasoning#Behavior Calibration#Self-evolving Data Flywheel

Not triaged yet

64 65 66 67 68