Papers196

SCOPE: Prompt Evolution for Enhancing Agent Effectiveness

Zehua Pei, Hui-Ling Zhen et al.Dec 17arXiv

SCOPE lets AI agents rewrite their own instructions while they are working, so they can fix mistakes and get smarter on the next step, not just the next task.

#prompt evolution#LLM agents#context management

Prompt Repetition Improves Non-Reasoning LLMs

Beginner

Yaniv Leviathan, Matan Kalman et al.Dec 17arXiv

Repeating the entire prompt once (QUERY→QUERY+QUERY) helps many large language models answer better when you are not asking them to show their reasoning.

#prompt repetition#non-reasoning LLMs#causal attention

HERBench: A Benchmark for Multi-Evidence Integration in Video Question Answering

Beginner

Dan Ben-Ami, Gabriele Serussi et al.Dec 16arXiv

HERBench is a new test that checks if video AI models can combine several clues spread across time, not just guess from one frame or language priors.

#Video Question Answering#Video-LLM#Multi-Evidence Integration

Sparse-LaViDa: Sparse Multimodal Discrete Diffusion Language Models

Beginner

Shufan Li, Jiuxiang Gu et al.Dec 16arXiv

Sparse-LaViDa makes diffusion-style AI models much faster by skipping unhelpful masked tokens during generation while keeping quality the same.

#Masked Discrete Diffusion#Sparse Parameterization#Register Tokens

Olmo 3

Beginner

Team Olmo, : et al.Dec 15arXiv

Olmo 3 is a family of fully-open AI language models (7B and 32B) where every step—from raw data to training code and checkpoints—is released.

#fully-open language models#model flow#long-context reasoning

Image Diffusion Preview with Consistency Solver

Beginner

Fu-Yun Wang, Hao Zhou et al.Dec 15arXiv

Diffusion Preview is a two-step “preview-then-refine” workflow that shows you a fast draft image first and only spends full compute after you like the draft.

#diffusion preview#consistency solver#pf-ode

Reveal Hidden Pitfalls and Navigate Next Generation of Vector Similarity Search from Task-Centric Views

Beginner

Tingyang Chen, Cong Fu et al.Dec 15arXiv

The paper shows that judging vector search only by distance-based recall and speed can be very misleading for real tasks.

Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling

Beginner

Yuran Wang, Bohan Zeng et al.Dec 14arXiv

Scone is a new AI method that makes images from instructions while correctly picking the right subject even when many look similar.

#subject-driven image generation#multi-subject composition#subject distinction

The FACTS Leaderboard: A Comprehensive Benchmark for Large Language Model Factuality

Beginner

Aileen Cheng, Alon Jacovi et al.Dec 11arXiv

The FACTS Leaderboard is a four-part test that checks how truthful AI models are across images, memory, web search, and document grounding.

#LLM factuality#benchmarking#multimodal evaluation

Sharp Monocular View Synthesis in Less Than a Second

Beginner

Lars Mescheder, Wei Dong et al.Dec 11arXiv

SHARP turns a single photo into a 3D scene you can look around in, and it does this in under one second on a single GPU.

#monocular view synthesis#3D Gaussians#real-time neural rendering

Confucius Code Agent: Scalable Agent Scaffolding for Real-World Codebases

Beginner

Sherman Wong, Zhenting Qi et al.Dec 11arXiv

This paper introduces the Confucius Code Agent (CCA), a coding helper built to handle huge real-world codebases with long tasks and many tools.

#coding agents#agent scaffolding#context management

MotionEdit: Benchmarking and Learning Motion-Centric Image Editing

Beginner

Yixin Wan, Lei Ke et al.Dec 11arXiv

This paper creates MotionEdit, a high-quality dataset that teaches AI to change how people and objects move in a picture without breaking their looks or the scene.

#motion-centric image editing#optical flow#MotionEdit dataset

13 14 15 16 17