Papers21

#LoRA

HyperAlign: Hypernetwork for Efficient Test-Time Alignment of Diffusion Models

Diffusion models make pictures from noise but often miss what people actually want in the prompt or what looks good to humans.

#diffusion models#rectified flow#hypernetwork

Not triaged yet

Evaluating Parameter Efficient Methods for RLVR

Intermediate

Qingyu Yin, Yulun Wu et al.Dec 29arXiv

The paper asks which small, add-on training tricks (PEFT) work best when we teach language models with yes/no rewards we can check (RLVR).

#RLVR#parameter-efficient fine-tuning#LoRA

Not triaged yet

DreamOmni3: Scribble-based Editing and Generation

Intermediate

Bin Xia, Bohao Peng et al.Dec 27arXiv

DreamOmni3 lets people edit and create images by combining text, example images, and quick hand-drawn scribbles.

#scribble-based editing#scribble-based generation#joint input scheme

Not triaged yet

IC-Effect: Precise and Efficient Video Effects Editing via In-Context Learning

Intermediate

Yuanhang Li, Yiren Song et al.Dec 17arXiv

IC-Effect is a new way to add special effects to existing videos by following a text instruction while keeping everything else unchanged.

#video editing#visual effects#diffusion transformer

Not triaged yet

Improving Recursive Transformers with Mixture of LoRAs

Intermediate

Mohammadmahdi Nouriborji, Morteza Rohanian et al.Dec 14arXiv

Recursive transformers save memory by reusing the same layer over and over, but that makes them less expressive and hurts accuracy.

#Mixture of LoRAs#recursive transformers#parameter sharing

Not triaged yet

Exploring MLLM-Diffusion Information Transfer with MetaCanvas

Intermediate

Han Lin, Xichen Pan et al.Dec 12arXiv

MetaCanvas lets a multimodal language model (MLLM) sketch a plan inside the generator’s hidden canvas so diffusion models can follow it patch by patch.

#MetaCanvas#MLLM#Diffusion Transformer

Not triaged yet

Omni-Attribute: Open-vocabulary Attribute Encoder for Visual Concept Personalization

Intermediate

Tsai-Shien Chen, Aliaksandr Siarohin et al.Dec 11arXiv

Omni-Attribute is a new image encoder that learns just the parts of a picture you ask for (like hairstyle or lighting) and ignores the rest.

#open-vocabulary attribute encoder#attribute disentanglement#visual concept personalization

Not triaged yet

InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models

Intermediate

Hongyuan Tao, Bencheng Liao et al.Dec 9arXiv

InfiniteVL is a vision-language model that mixes two ideas: local focus with Sliding Window Attention and long-term memory with a linear module called Gated DeltaNet.

#InfiniteVL#linear attention#Gated DeltaNet

Not triaged yet

Position: Universal Aesthetic Alignment Narrows Artistic Expression

Intermediate

Wenqi Marshall Guo, Qingyun Qian et al.Dec 9arXiv

The paper shows that many AI image generators are trained to prefer one popular idea of beauty, even when a user clearly asks for something messy, dark, blurry, or emotionally heavy.

#universal aesthetic alignment#aesthetic pluralism#reward models

Not triaged yet

1 2