Papers1055

Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling

Chulun Zhou, Chunkang Zhang et al.Dec 30arXiv

Multi-step RAG systems often struggle with long documents because their memory is just a pile of isolated facts, not a connected understanding.

#multi-step RAG#hypergraph memory#hyperedge merging

Not triaged yet

Pretraining Frame Preservation in Autoregressive Video Memory Compression

Intermediate

Lvmin Zhang, Shengqu Cai et al.Dec 29arXiv

The paper teaches a video model to squeeze long video history into a tiny memory while still keeping sharp details in single frames.

#autoregressive video generation#video memory compression#frame retrieval pretraining

Not triaged yet

Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion

Intermediate

Hau-Shiang Shiu, Chin-Yang Lin et al.Dec 29arXiv

This paper makes diffusion-based video super-resolution (VSR) practical for live, low-latency use by removing the need for future frames and cutting denoising from ~50 steps down to just 4.

#video super-resolution#diffusion model#latent diffusion

Not triaged yet

Training AI Co-Scientists Using Rubric Rewards

Intermediate

Shashwat Goel, Rishi Hazra et al.Dec 29arXiv

The paper teaches AI to write strong research plans by letting it grade its own work using checklists (rubrics) pulled from real scientific papers.

#AI co-scientist#research plan generation#rubric rewards

Not triaged yet

Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation

Intermediate

Shaocong Xu, Songlin Wei et al.Dec 29arXiv

Transparent and shiny objects confuse normal depth cameras, but video diffusion models already learned how light bends and reflects through them.

#video diffusion model#transparent object depth#normal estimation

Not triaged yet

Web World Models

Intermediate

Jichen Feng, Yifan Zhang et al.Dec 29arXiv

This paper introduces Web World Models (WWMs), a way to build huge, explorable worlds by putting strict rules in code and letting AI write the fun details.

#Web World Model#typed interfaces#deterministic hashing

Not triaged yet

End-to-End Test-Time Training for Long Context

Intermediate

Arnuv Tandon, Karan Dalal et al.Dec 29arXiv

This paper shows how a language model can keep learning while you use it, so it handles very long inputs without slowing down.

#Test-Time Training#Meta-learning#Long-context language modeling

Not triaged yet

Active Perception Agent for Omnimodal Audio-Video Understanding

Intermediate

Keda Tao, Wenjie Du et al.Dec 29arXiv

This paper introduces OmniAgent, a smart video-and-audio detective that actively decides when to listen and when to look.

#active perception#omnimodal understanding#audio-guided event localization

Not triaged yet

ProGuard: Towards Proactive Multimodal Safeguard

Intermediate

Shaohan Yu, Lijun Li et al.Dec 29arXiv

ProGuard is a safety guard for text and images that doesn’t just spot known problems—it can also recognize and name new, never-seen-before risks.

#proactive safety#multimodal moderation#out-of-distribution detection

Not triaged yet

Act2Goal: From World Model To General Goal-conditioned Policy

Intermediate

Pengfei Zhou, Liliang Chen et al.Dec 29arXiv

Robots often get confused on long, multi-step tasks when they only see the final goal image and try to guess the next move directly.

#goal-conditioned policy#visual world model#multi-scale temporal hashing

Not triaged yet

Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss

Intermediate

Ang Lv, Jin Ma et al.Dec 29arXiv

Mixture-of-Experts (MoE) models use many small specialist networks (experts) and a router to pick which experts handle each token, but the router isn’t explicitly taught what each expert is good at.

#Mixture-of-Experts#expert-router coupling#auxiliary loss

Not triaged yet

MindWatcher: Toward Smarter Multimodal Tool-Integrated Reasoning

Intermediate

Jiawei Chen, Xintian Shen et al.Dec 29arXiv

MindWatcher is a smart AI agent that can think step by step and decide when to use tools like web search, image zooming, and a code calculator to solve tough, multi-step problems.

#Tool-Integrated Reasoning#Interleaved Thinking#Multimodal Chain-of-Thought

Not triaged yet

63 64 65 66 67