Papers943

ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding

ReFusion is a new way for AI to write text faster by planning in chunks (called slots) and then filling each chunk carefully.

#ReFusion#masked diffusion model#parallel decoding

Memory in the Age of AI Agents

Intermediate

Yuyang Hu, Shichun Liu et al.Dec 15arXiv

This survey explains how AI agents remember things and organizes the whole topic into three clear parts: forms, functions, and dynamics.

#Agent memory#LLM memory#Retrieval-augmented generation

Janus: Disaggregating Attention and Experts for Scalable MoE Inference

Intermediate

Zhexiang Zhang, Ye Wang et al.Dec 15arXiv

Janus splits a Mixture-of-Experts (MoE) model into two parts—attention and experts—so each can use just the right amount of GPUs.

#Mixture-of-Experts inference#disaggregated serving#activation load balancing

Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model

Intermediate

Team Seedance, Heyi Chen et al.Dec 15arXiv

Seedance 1.5 pro is a single model that makes video and sound together at the same time, so lips, music, and actions match naturally.

#audio-visual generation#diffusion transformer#cross-modal synchronization

Scaling Laws for Code: Every Programming Language Matters

Intermediate

Jian Yang, Shawn Guo et al.Dec 15arXiv

Different programming languages scale differently when training code AI models, so treating them all the same wastes compute and lowers performance.

#multilingual code pre-training#scaling laws#language-specific scaling

RecTok: Reconstruction Distillation along Rectified Flow

Intermediate

Qingyu Shi, Size Wu et al.Dec 15arXiv

RecTok is a new visual tokenizer that teaches the whole training path of a diffusion model (the forward flow) to be smart about image meaning, not just the starting latent features.

#Rectified Flow#Flow Matching#Visual Tokenizer

Differentiable Evolutionary Reinforcement Learning

Intermediate

Sitao Cheng, Tianle Li et al.Dec 15arXiv

This paper introduces DERL, a two-level learning system that automatically builds better reward functions for reinforcement learning agents.

#Differentiable Evolutionary Reinforcement Learning#Meta-Optimizer#Meta-Reward

FIN-bench-v2: A Unified and Robust Benchmark Suite for Evaluating Finnish Large Language Models

Intermediate

Joona Kytöniemi, Jousia Piha et al.Dec 15arXiv

FIN-bench-v2 is a big, tidy set of Finnish tests that checks how good large language models are at many things like reading, logic, and world knowledge.

#Finnish language models#benchmark suite#HuggingFace Datasets

KlingAvatar 2.0 Technical Report

Intermediate

Kling Team, Jialu Chen et al.Dec 15arXiv

KlingAvatar 2.0 is a system that makes long, sharp, lifelike talking-person videos that follow audio, images, and text instructions all at once.

#audio-driven avatar#video diffusion#diffusion transformer

ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement

Intermediate

Zhihang Liu, Xiaoyi Bao et al.Dec 15arXiv

ShowTable is a new way for AI to turn a data table into a beautiful, accurate infographic using a think–make–check–fix loop.

#creative table visualization#multimodal large language model#diffusion model

Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans?

Intermediate

Jiaqi Wang, Weijia Wu et al.Dec 15arXiv

This paper builds a new test called Video Reality Test to see if AI-made ASMR videos can fool both people and AI video watchers (VLMs).

#ASMR#audio-visual coupling#AI-generated video detection

Toward Ambulatory Vision: Learning Visually-Grounded Active View Selection

Intermediate

Juil Koo, Daehyeon Choi et al.Dec 15arXiv

This paper teaches robots to move their camera to a better spot before answering a question about what they see.

#Active Perception#Embodied AI#Vision-Language Models

66 67 68 69 70