Papers1262

FABLE: Forest-Based Adaptive Bi-Path LLM-Enhanced Retrieval for Multi-Document Reasoning

Lin Sun, Linglin Zhang et al.Jan 26arXiv

FABLE is a new retrieval system that helps AI find and combine facts from many documents by letting the AI both organize the library and choose the right shelves to read.

#FABLE#Structured RAG#Hierarchical retrieval

Not triaged yet

DRPG (Decompose, Retrieve, Plan, Generate): An Agentic Framework for Academic Rebuttal

Intermediate

Peixuan Han, Yingjie Yu et al.Jan 26arXiv

DRPG is a four-step AI helper that writes strong academic rebuttals by first breaking a review into parts, then fetching evidence, planning a strategy, and finally writing the response.

#academic rebuttal#agentic framework#planning with LLMs

Not triaged yet

Masked Depth Modeling for Spatial Perception

Intermediate

Bin Tan, Changjiang Sun et al.Jan 25arXiv

The paper turns the 'holes' (missing spots) in depth camera images into helpful training hints instead of treating them as garbage.

#Masked Depth Modeling#RGB-D cameras#Depth completion

Not triaged yet

EEG Foundation Models: Progresses, Benchmarking, and Open Problems

Intermediate

Dingkun Liu, Yuheng Chen et al.Jan 25arXiv

This paper builds a fair, big playground (a benchmark) to test many EEG foundation models side-by-side on the same rules.

#EEG foundation models#brain-computer interface#self-supervised learning

Not triaged yet

AR-Omni: A Unified Autoregressive Model for Any-to-Any Generation

Intermediate

Dongjie Cheng, Ruifeng Yuan et al.Jan 25arXiv

AR-Omni is a single autoregressive model that can take in and produce text, images, and speech without extra expert decoders.

#autoregressive modeling#multimodal large language model#any-to-any generation

Not triaged yet

The Script is All You Need: An Agentic Framework for Long-Horizon Dialogue-to-Cinematic Video Generation

Intermediate

Chenyu Mu, Xin He et al.Jan 25arXiv

This paper teaches AI to turn simple dialogue into full movie scenes by first writing a detailed script and then filming it step by step.

#dialogue-to-video#cinematic script generation#ScripterAgent

Not triaged yet

Fast KVzip: Efficient and Accurate LLM Inference with Gated KV Eviction

Intermediate

Jang-Hyun Kim, Dongyoon Han et al.Jan 25arXiv

Fast KVzip is a new way to shrink an LLM’s memory (the KV cache) while keeping answers just as accurate.

#KV cache compression#gated KV eviction#sink attention

Not triaged yet

AVMeme Exam: A Multimodal Multilingual Multicultural Benchmark for LLMs' Contextual and Cultural Knowledge and Thinking

Intermediate

Xilin Jiang, Qiaolin Wang et al.Jan 25arXiv

AVMeme Exam is a new test made by humans that checks if AI can understand famous internet audio and video clips the way people do.

#AVMeme Exam#multimodal large language models#audio-visual memes

Not triaged yet

End-to-End Joint ASR and Speaker Role Diarization with Child-Adult Interactions

Intermediate

Anfeng Xu, Tiantian Feng et al.Jan 25arXiv

This paper builds one smart system that listens to child–adult conversations and writes what was said, who said it, and exactly when each person spoke.

#end-to-end ASR#speaker diarization#child speech

Not triaged yet

Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers

Beginner

Zecheng Tang, Quantong Qiu et al.Jan 24arXiv

Transformers slow down on very long inputs because standard attention looks at every token pair, which is expensive.

#elastic attention#sparse attention#full attention

Not triaged yet

SkyReels-V3 Technique Report

Intermediate

Debang Li, Zhengcong Fei et al.Jan 24arXiv

SkyReels-V3 is a single AI model that can make videos in three ways: from reference images, by extending an existing video, and by creating talking avatars from audio.

#video generation#diffusion transformer#multimodal in-context learning

Not triaged yet

PingPong: A Natural Benchmark for Multi-Turn Code-Switching Dialogues

Intermediate

Mohammad Rifqi Farhansyah, Hanif Muhammad Zhafran et al.Jan 24arXiv

Most people on Earth speak more than one language and often switch languages in the same chat, but AI tools aren’t tested well on this real behavior.

#code-switching#multilingual NLP#trilingual dialogue

Not triaged yet

48 49 50 51 52