Papers1262

All Beginner Intermediate Advanced

All Sources arXiv

CMI-RewardBench: Evaluating Music Reward Models with Compositional Multimodal Instruction

Intermediate

Yinghao Ma, Haiwen Xia et al.Feb 28arXiv

Modern music AIs can follow text, lyrics, and even example audio, but judges that score these songs have not kept up.

#music reward model#compositional multimodal instruction#text-to-music evaluation

Not triaged yet

Spectral Condition for $μ$P under Width-Depth Scaling

Intermediate

Chenyu Zheng, Rongzhen Wang et al.Feb 28arXiv

Big AI models keep getting wider (more neurons per layer) and deeper (more layers), which often makes training unstable and hyperparameters hard to reuse.

#maximal update parametrization#μP#spectral condition

Not triaged yet

RAISE: Requirement-Adaptive Evolutionary Refinement for Training-Free Text-to-Image Alignment

Intermediate

Liyao Jiang, Ruichen Chen et al.Feb 28arXiv

Text-to-image models can make pretty pictures but still miss details in complex prompts, like counts, positions, or exact text.

#text-to-image alignment#adaptive inference#evolutionary refinement

Not triaged yet

DreamWorld: Unified World Modeling in Video Generation

Intermediate

Boming Tan, Xiangdong Zhang et al.Feb 28arXiv

DreamWorld is a new way to make videos that not only look real but also follow common-sense rules about motion, space, and meaning.

#video diffusion transformer#world model#optical flow

Not triaged yet

Mode Seeking meets Mean Seeking for Fast Long Video Generation

Intermediate

Shengqu Cai, Weili Nie et al.Feb 27arXiv

Short videos are easy for AI to make sharp and lively, but long videos need stories and memory, and there isn’t much training data for that.

#long video generation#flow matching#distribution matching

Not triaged yet

CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation

Intermediate

Weinan Dai, Hanlin Wu et al.Feb 27arXiv

CUDA Agent is a training system that teaches an AI to write super-fast GPU code (CUDA kernels) by practicing, testing, and getting rewards for correct and speedy results.

#CUDA kernel generation#agentic reinforcement learning#PPO actor-critic

Not triaged yet

Memory Caching: RNNs with Growing Memory

Beginner

Ali Behrouz, Zeman Li et al.Feb 27arXiv

Recurrent neural networks (RNNs) are fast but forgetful because they squeeze everything they’ve seen into a tiny, fixed memory.

#Memory Caching#Recurrent Neural Networks#Attention

Not triaged yet

Compositional Generalization Requires Linear, Orthogonal Representations in Vision Embedding Models

Intermediate

Arnas Uselis, Andrea Dittadi et al.Feb 27arXiv

The paper asks a simple question: what must a vision model’s internal pictures (embeddings) look like if it can recognize new mixes of things it already knows?

#compositional generalization#linear representation hypothesis#orthogonal representations

Not triaged yet

Enhancing Spatial Understanding in Image Generation via Reward Modeling

Intermediate

Zhenyu Tang, Chaoran Feng et al.Feb 27arXiv

This paper teaches image generators to place objects in the right spots by building a special teacher called a reward model focused on spatial relationships.

#spatial reasoning#reward modeling#preference learning

Not triaged yet

SenCache: Accelerating Diffusion Model Inference via Sensitivity-Aware Caching

Intermediate

Yasaman Haghighi, Alexandre AlahiFeb 27arXiv

SenCache speeds up video diffusion models by reusing past answers only when the model is predicted to change very little.

#diffusion models#video generation#caching

Not triaged yet

InfoNCE Induces Gaussian Distribution

Intermediate

Roy Betser, Eyal Gofer et al.Feb 27arXiv

The paper shows that when we train with the popular InfoNCE contrastive loss, the learned features start to behave like they come from a Gaussian (bell-shaped) distribution.

#InfoNCE#contrastive learning#Gaussian embeddings

Not triaged yet

Accelerating Masked Image Generation by Learning Latent Controlled Dynamics

Intermediate

Kaiwen Zhu, Quansheng Zeng et al.Feb 27arXiv

Masked Image Generation Models (MIGMs) make pictures by filling in many blank spots step by step, but each step is slow and repeats a lot of work.

#masked image generation#MIGM-Shortcut#latent controlled dynamics

Not triaged yet

6 7 8 9 10