Papers1055

RAISE: Requirement-Adaptive Evolutionary Refinement for Training-Free Text-to-Image Alignment

Liyao Jiang, Ruichen Chen et al.Feb 28arXiv

Text-to-image models can make pretty pictures but still miss details in complex prompts, like counts, positions, or exact text.

#text-to-image alignment#adaptive inference#evolutionary refinement

DreamWorld: Unified World Modeling in Video Generation

Intermediate

Boming Tan, Xiangdong Zhang et al.Feb 28arXiv

DreamWorld is a new way to make videos that not only look real but also follow common-sense rules about motion, space, and meaning.

#video diffusion transformer#world model#optical flow

Mode Seeking meets Mean Seeking for Fast Long Video Generation

Intermediate

Shengqu Cai, Weili Nie et al.Feb 27arXiv

Short videos are easy for AI to make sharp and lively, but long videos need stories and memory, and there isn’t much training data for that.

#long video generation#flow matching#distribution matching

CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation

Intermediate

Weinan Dai, Hanlin Wu et al.Feb 27arXiv

CUDA Agent is a training system that teaches an AI to write super-fast GPU code (CUDA kernels) by practicing, testing, and getting rewards for correct and speedy results.

#CUDA kernel generation#agentic reinforcement learning#PPO actor-critic

Compositional Generalization Requires Linear, Orthogonal Representations in Vision Embedding Models

Intermediate

Arnas Uselis, Andrea Dittadi et al.Feb 27arXiv

The paper asks a simple question: what must a vision model’s internal pictures (embeddings) look like if it can recognize new mixes of things it already knows?

#compositional generalization#linear representation hypothesis#orthogonal representations

Enhancing Spatial Understanding in Image Generation via Reward Modeling

Intermediate

Zhenyu Tang, Chaoran Feng et al.Feb 27arXiv

This paper teaches image generators to place objects in the right spots by building a special teacher called a reward model focused on spatial relationships.

#spatial reasoning#reward modeling#preference learning

SenCache: Accelerating Diffusion Model Inference via Sensitivity-Aware Caching

Intermediate

Yasaman Haghighi, Alexandre AlahiFeb 27arXiv

SenCache speeds up video diffusion models by reusing past answers only when the model is predicted to change very little.

#diffusion models#video generation#caching

InfoNCE Induces Gaussian Distribution

Intermediate

Roy Betser, Eyal Gofer et al.Feb 27arXiv

The paper shows that when we train with the popular InfoNCE contrastive loss, the learned features start to behave like they come from a Gaussian (bell-shaped) distribution.

#InfoNCE#contrastive learning#Gaussian embeddings

Accelerating Masked Image Generation by Learning Latent Controlled Dynamics

Intermediate

Kaiwen Zhu, Quansheng Zeng et al.Feb 27arXiv

Masked Image Generation Models (MIGMs) make pictures by filling in many blank spots step by step, but each step is slow and repeats a lot of work.

#masked image generation#MIGM-Shortcut#latent controlled dynamics

Half-Truths Break Similarity-Based Retrieval

Intermediate

Bora Kargi, Arnas Uselis et al.Feb 27arXiv

Similarity-based image–text models like CLIP can be fooled by “half-truths,” where adding one plausible but wrong detail makes a caption look more similar to an image instead of less similar.

#half-truth vulnerability#similarity-based retrieval#CLIP

Ref-Adv: Exploring MLLM Visual Reasoning in Referring Expression Tasks

Intermediate

Qihua Dong, Kuo Yang et al.Feb 27arXiv

This paper builds a new test called Ref-Adv to check if AI can truly match tricky sentences to the right thing in a picture.

#Referring Expression Comprehension#Visual Grounding#Multimodal Large Language Models

LK Losses: Direct Acceptance Rate Optimization for Speculative Decoding

Intermediate

Alexander Samarin, Sergei Krutikov et al.Feb 27arXiv

Speculative decoding speeds up big language models by letting a small helper model guess several next words and having the big model check them all at once.

#speculative decoding#acceptance rate#LK losses

4 5 6 7 8