Papers2

#Cross-Attention

Semantic Routing: Exploring Multi-Layer LLM Feature Weighting for Diffusion Transformers

The paper shows that using information from many layers of a language model (not just one) helps text-to-image diffusion transformers follow prompts much better.

#Diffusion Transformer#Text Conditioning#Multi-layer LLM Features

Not triaged yet

ACoT-VLA: Action Chain-of-Thought for Vision-Language-Action Models

Intermediate

Linqing Zhong, Yi Liu et al.Jan 16arXiv

Robots usually think in words and pictures, but their hands need exact motions, so there is a gap between understanding and doing.

#Vision-Language-Action#Action Chain-of-Thought#Explicit Action Reasoner

Not triaged yet