Papers200

Sharp Monocular View Synthesis in Less Than a Second

Lars Mescheder, Wei Dong et al.Dec 11arXiv

SHARP turns a single photo into a 3D scene you can look around in, and it does this in under one second on a single GPU.

#monocular view synthesis#3D Gaussians#real-time neural rendering

Not triaged yet

Confucius Code Agent: Scalable Agent Scaffolding for Real-World Codebases

Beginner

Sherman Wong, Zhenting Qi et al.Dec 11arXiv

This paper introduces the Confucius Code Agent (CCA), a coding helper built to handle huge real-world codebases with long tasks and many tools.

#coding agents#agent scaffolding#context management

Not triaged yet

MotionEdit: Benchmarking and Learning Motion-Centric Image Editing

Beginner

Yixin Wan, Lei Ke et al.Dec 11arXiv

This paper creates MotionEdit, a high-quality dataset that teaches AI to change how people and objects move in a picture without breaking their looks or the scene.

#motion-centric image editing#optical flow#MotionEdit dataset

Not triaged yet

VABench: A Comprehensive Benchmark for Audio-Video Generation

Beginner

Daili Hua, Xizhi Wang et al.Dec 10arXiv

VABench is a new, all-in-one test that checks how well AI makes videos with matching sound and pictures.

#audio-video benchmark#synchronization#lip-sync

Not triaged yet

Visionary: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform

Beginner

Yuning Gong, Yifei Liu et al.Dec 9arXiv

Visionary is a web-based platform that lets you view and interact with advanced 3D scenes, right in your browser, with just a click.

#WebGPU#3D Gaussian Splatting#ONNX Runtime Web

Not triaged yet

Towards a Science of Scaling Agent Systems

Beginner

Yubin Kim, Ken Gu et al.Dec 9arXiv

Multi-agent AI teams are not automatically better; their success depends on matching the team’s coordination style to the job’s structure.

#multi-agent systems#agentic evaluation#scaling laws

Not triaged yet

UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation

Beginner

Jiehui Huang, Yuechen Zhang et al.Dec 8arXiv

UnityVideo is a single, unified model that learns from many kinds of video information at once—like colors (RGB), depth, motion (optical flow), body pose, skeletons, and segmentation—to make smarter, more realistic videos.

#multimodal video generation#multi-task learning#dynamic noise scheduling

Not triaged yet

OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory

Beginner

Zhaochong An, Menglin Jia et al.Dec 8arXiv

OneStory is a new way to make long videos from many shots that stay consistent with the story, characters, and places across time.

#multi-shot video generation#adaptive memory#frame selection

Not triaged yet

On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models

Beginner

Charlie Zhang, Graham Neubig et al.Dec 8arXiv

The paper asks when reinforcement learning (RL) really makes language models better at reasoning beyond what they learned in pre-training.

#edge of competence#process-verified evaluation#process-level rewards

Not triaged yet

Distribution Matching Variational AutoEncoder

Beginner

Sen Ye, Jianning Pei et al.Dec 8arXiv

This paper shows a new way to teach an autoencoder to shape its hidden space (the 'latent space') to look like any distribution we want, not just a simple bell curve.

#Distribution Matching VAE#Latent Space#Self-Supervised Learning

Not triaged yet

DeepCode: Open Agentic Coding

Beginner

Zongwei Li, Zhonghang Li et al.Dec 8arXiv

DeepCode is an AI coding system that turns long, complicated papers into full, working code repositories.

#agentic coding#document-to-code#information-flow management

Not triaged yet

Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning

Beginner

Tong Wu, Yang Liu et al.Dec 8arXiv

This paper teaches a language model to think along several paths at the same time instead of one step after another.

#parallel reasoning#reinforcement learning for LLMs#self-distillation

Not triaged yet

13 14 15 16 17