Papers200

Trainable Log-linear Sparse Attention for Efficient Diffusion Transformers

This paper introduces Log-linear Sparse Attention (LLSA), a new way for Diffusion Transformers to focus only on the most useful information using a smart, layered search.

#Log-linear Sparse Attention#Hierarchical Top-K#Hierarchical KV Enrichment

Not triaged yet

Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs

Beginner

Sara Papi, Javier Garcia Gilabert et al.Dec 18arXiv

This paper builds a big, fair test called Hearing to Translate to check how well different speech translation systems work in the real world.

#speech translation#Speech-LLM#cascaded ASR-MT

Not triaged yet

LoPA: Scaling dLLM Inference via Lookahead Parallel Decoding

Beginner

Chenkai Xu, Yijie Jin et al.Dec 18arXiv

This paper speeds up diffusion language models (dLLMs) by changing the order in which they fill in missing words.

#Diffusion LLM#Parallel decoding#Token Filling Order

Not triaged yet

SCOPE: Prompt Evolution for Enhancing Agent Effectiveness

Beginner

Zehua Pei, Hui-Ling Zhen et al.Dec 17arXiv

SCOPE lets AI agents rewrite their own instructions while they are working, so they can fix mistakes and get smarter on the next step, not just the next task.

#prompt evolution#LLM agents#context management

Not triaged yet

Prompt Repetition Improves Non-Reasoning LLMs

Beginner

Yaniv Leviathan, Matan Kalman et al.Dec 17arXiv

Repeating the entire prompt once (QUERY→QUERY+QUERY) helps many large language models answer better when you are not asking them to show their reasoning.

#prompt repetition#non-reasoning LLMs#causal attention

Not triaged yet

HERBench: A Benchmark for Multi-Evidence Integration in Video Question Answering

Beginner

Dan Ben-Ami, Gabriele Serussi et al.Dec 16arXiv

HERBench is a new test that checks if video AI models can combine several clues spread across time, not just guess from one frame or language priors.

#Video Question Answering#Video-LLM#Multi-Evidence Integration

Not triaged yet

Sparse-LaViDa: Sparse Multimodal Discrete Diffusion Language Models

Beginner

Shufan Li, Jiuxiang Gu et al.Dec 16arXiv

Sparse-LaViDa makes diffusion-style AI models much faster by skipping unhelpful masked tokens during generation while keeping quality the same.

#Masked Discrete Diffusion#Sparse Parameterization#Register Tokens

Not triaged yet

Olmo 3

Beginner

Team Olmo, : et al.Dec 15arXiv

Olmo 3 is a family of fully-open AI language models (7B and 32B) where every step—from raw data to training code and checkpoints—is released.

#fully-open language models#model flow#long-context reasoning

Not triaged yet

Image Diffusion Preview with Consistency Solver

Beginner

Fu-Yun Wang, Hao Zhou et al.Dec 15arXiv

Diffusion Preview is a two-step “preview-then-refine” workflow that shows you a fast draft image first and only spends full compute after you like the draft.

#diffusion preview#consistency solver#pf-ode

Not triaged yet

Reveal Hidden Pitfalls and Navigate Next Generation of Vector Similarity Search from Task-Centric Views

Beginner

Tingyang Chen, Cong Fu et al.Dec 15arXiv

The paper shows that judging vector search only by distance-based recall and speed can be very misleading for real tasks.

Not triaged yet

Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling

Beginner

Yuran Wang, Bohan Zeng et al.Dec 14arXiv

Scone is a new AI method that makes images from instructions while correctly picking the right subject even when many look similar.

#subject-driven image generation#multi-subject composition#subject distinction

Not triaged yet

The FACTS Leaderboard: A Comprehensive Benchmark for Large Language Model Factuality

Beginner

Aileen Cheng, Alon Jacovi et al.Dec 11arXiv

The FACTS Leaderboard is a four-part test that checks how truthful AI models are across images, memory, web search, and document grounding.

#LLM factuality#benchmarking#multimodal evaluation

Not triaged yet

13 14 15 16 17