Papers1262

RoboTracer: Mastering Spatial Trace with Reasoning in Vision-Language Models for Robotics

Enshen Zhou, Cheng Chi et al.Dec 15arXiv

RoboTracer is a vision-language model that turns tricky, word-only instructions into safe, step-by-step 3D paths (spatial traces) robots can follow.

#RoboTracer#spatial trace#3D spatial referring

Not triaged yet

Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models

Intermediate

Boxin Wang, Chankyu Lee et al.Dec 15arXiv

The paper introduces Nemotron-Cascade, a step-by-step (cascaded) reinforcement learning recipe that trains an AI across domains like alignment, instructions, math, coding, and software engineering—one at a time.

#Cascaded Reinforcement Learning#RLHF#Instruction-Following RL

Not triaged yet

LongVie 2: Multimodal Controllable Ultra-Long Video World Model

Intermediate

Jianxiong Gao, Zhaoxi Chen et al.Dec 15arXiv

LongVie 2 is a video world model that can generate controllable videos for 3–5 minutes while keeping the look and motion steady over time.

#long video generation#world model#multimodal control

Not triaged yet

Image Diffusion Preview with Consistency Solver

Beginner

Fu-Yun Wang, Hao Zhou et al.Dec 15arXiv

Diffusion Preview is a two-step “preview-then-refine” workflow that shows you a fast draft image first and only spends full compute after you like the draft.

#diffusion preview#consistency solver#pf-ode

Not triaged yet

ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding

Intermediate

Jia-Nan Li, Jian Guan et al.Dec 15arXiv

ReFusion is a new way for AI to write text faster by planning in chunks (called slots) and then filling each chunk carefully.

#ReFusion#masked diffusion model#parallel decoding

Not triaged yet

Memory in the Age of AI Agents

Intermediate

Yuyang Hu, Shichun Liu et al.Dec 15arXiv

This survey explains how AI agents remember things and organizes the whole topic into three clear parts: forms, functions, and dynamics.

#Agent memory#LLM memory#Retrieval-augmented generation

Not triaged yet

Janus: Disaggregating Attention and Experts for Scalable MoE Inference

Intermediate

Zhexiang Zhang, Ye Wang et al.Dec 15arXiv

Janus splits a Mixture-of-Experts (MoE) model into two parts—attention and experts—so each can use just the right amount of GPUs.

#Mixture-of-Experts inference#disaggregated serving#activation load balancing

Not triaged yet

Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model

Intermediate

Team Seedance, Heyi Chen et al.Dec 15arXiv

Seedance 1.5 pro is a single model that makes video and sound together at the same time, so lips, music, and actions match naturally.

#audio-visual generation#diffusion transformer#cross-modal synchronization

Not triaged yet

Scaling Laws for Code: Every Programming Language Matters

Intermediate

Jian Yang, Shawn Guo et al.Dec 15arXiv

Different programming languages scale differently when training code AI models, so treating them all the same wastes compute and lowers performance.

#multilingual code pre-training#scaling laws#language-specific scaling

Not triaged yet

RecTok: Reconstruction Distillation along Rectified Flow

Intermediate

Qingyu Shi, Size Wu et al.Dec 15arXiv

RecTok is a new visual tokenizer that teaches the whole training path of a diffusion model (the forward flow) to be smart about image meaning, not just the starting latent features.

#Rectified Flow#Flow Matching#Visual Tokenizer

Not triaged yet

Differentiable Evolutionary Reinforcement Learning

Intermediate

Sitao Cheng, Tianle Li et al.Dec 15arXiv

This paper introduces DERL, a two-level learning system that automatically builds better reward functions for reinforcement learning agents.

#Differentiable Evolutionary Reinforcement Learning#Meta-Optimizer#Meta-Reward

Not triaged yet

FIN-bench-v2: A Unified and Robust Benchmark Suite for Evaluating Finnish Large Language Models

Intermediate

Joona Kytöniemi, Jousia Piha et al.Dec 15arXiv

FIN-bench-v2 is a big, tidy set of Finnish tests that checks how good large language models are at many things like reading, logic, and world knowledge.

#Finnish language models#benchmark suite#HuggingFace Datasets

Not triaged yet

92 93 94 95 96