Papers1262

LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation

Ethan Chern, Zhulin Hu et al.Dec 29arXiv

LiveTalk turns slow, many-step video diffusion into a fast, 4-step, real-time system for talking avatars that listen, think, and respond with synchronized video.

#real-time video diffusion#on-policy distillation#multimodal conditioning

Not triaged yet

ProGuard: Towards Proactive Multimodal Safeguard

Intermediate

Shaohan Yu, Lijun Li et al.Dec 29arXiv

ProGuard is a safety guard for text and images that doesn’t just spot known problems—it can also recognize and name new, never-seen-before risks.

#proactive safety#multimodal moderation#out-of-distribution detection

Not triaged yet

Act2Goal: From World Model To General Goal-conditioned Policy

Intermediate

Pengfei Zhou, Liliang Chen et al.Dec 29arXiv

Robots often get confused on long, multi-step tasks when they only see the final goal image and try to guess the next move directly.

#goal-conditioned policy#visual world model#multi-scale temporal hashing

Not triaged yet

Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss

Intermediate

Ang Lv, Jin Ma et al.Dec 29arXiv

Mixture-of-Experts (MoE) models use many small specialist networks (experts) and a router to pick which experts handle each token, but the router isn’t explicitly taught what each expert is good at.

#Mixture-of-Experts#expert-router coupling#auxiliary loss

Not triaged yet

MindWatcher: Toward Smarter Multimodal Tool-Integrated Reasoning

Intermediate

Jiawei Chen, Xintian Shen et al.Dec 29arXiv

MindWatcher is a smart AI agent that can think step by step and decide when to use tools like web search, image zooming, and a code calculator to solve tough, multi-step problems.

#Tool-Integrated Reasoning#Interleaved Thinking#Multimodal Chain-of-Thought

Not triaged yet

A unified framework for detecting point and collective anomalies in operating system logs via collaborative transformers

Intermediate

Mohammad Nasirzadeh, Jafar Tahmoresnezhad et al.Dec 29arXiv

CoLog is a new AI system that reads computer logs like a story and spots both single strange events (point anomalies) and strange patterns over time (collective anomalies).

#log anomaly detection#multimodal learning#collaborative transformer

Not triaged yet

AI Meets Brain: Memory Systems from Cognitive Neuroscience to Autonomous Agents

Intermediate

Jiafeng Liang, Hao Li et al.Dec 29arXiv

This survey links how human brains remember things to how AI agents should remember things so they can act smarter over time.

#agent memory#episodic memory#semantic memory

Not triaged yet

YOLO-Master: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detection

Intermediate

Xu Lin, Jinlong Peng et al.Dec 29arXiv

YOLO-Master is a new real-time object detector that uses a Mixture-of-Experts (MoE) design to spend more compute on hard scenes and less on easy ones.

#YOLO-Master#Mixture of Experts#ES-MoE

Not triaged yet

KernelEvolve: Scaling Agentic Kernel Coding for Heterogeneous AI Accelerators at Meta

Intermediate

Gang Liao, Hongsen Qin et al.Dec 29arXiv

KernelEvolve is a smart, self-improving system that writes and tunes tiny but crucial programs (kernels) so AI runs fast on many kinds of chips.

#KernelEvolve#agentic kernel coding#graph-based search

Not triaged yet

Bridging Your Imagination with Audio-Video Generation via a Unified Director

Intermediate

Jiaxu Zhang, Tianshu Hu et al.Dec 29arXiv

UniMAGE is a single “director” AI that writes a film-like script and draws the key pictures for each shot, so stories stay clear and characters look the same from scene to scene.

#Unified Director Model#Mixture-of-Transformers#Interleaved Concept Learning

Not triaged yet

Evaluating Parameter Efficient Methods for RLVR

Intermediate

Qingyu Yin, Yulun Wu et al.Dec 29arXiv

The paper asks which small, add-on training tricks (PEFT) work best when we teach language models with yes/no rewards we can check (RLVR).

#RLVR#parameter-efficient fine-tuning#LoRA

Not triaged yet

SurgWorld: Learning Surgical Robot Policies from Videos via World Modeling

Intermediate

Yufan He, Pengfei Guo et al.Dec 29arXiv

SurgWorld teaches surgical robots using videos plus text, then guesses the missing robot moves so we can train good policies without collecting tons of real robot-action data.

#surgical world model#SATA dataset#inverse dynamics model

Not triaged yet

77 78 79 80 81