dLLM is a single, open-source toolbox that standardizes how diffusion language models are trained, run, and tested.
Hepato-LLaVA is a special AI that reads giant microscope pictures of the liver and answers medical questions about cancer.
This paper speeds up image and video generators called diffusion transformers by changing how big their puzzle pieces (patches) are at each step.
This paper teaches image models to copy a change shown in one image pair and apply it to a new image, like saying 'hat added here, add a similar hat there.'
Decoder-only language models can be great at making user profiles (embeddings), but how we let them look at the sequence—called attention masking—changes how smart those profiles are.
The paper tries several different ways to translate five low-resource Turkic languages, instead of forcing one method to fit all.
LatentMem is a new memory system that helps teams of AI agents remember the right things for their specific jobs without overloading them with text.
This paper teaches AI how to fix broken Lean math proofs by learning from the compiler’s feedback, not just from finished, perfect proofs.
SLIME is a new way to train chatbots so they follow human preferences without forgetting how to write well.
The paper shows that three popular ways to control language models—fine-tuning a few weights, LoRA, and activation steering—are actually the same kind of action: a dynamic weight update driven by a control knob.
Multi-agent LLM systems often use LoRA adapters so each agent has a special role, but they all rebuild almost the same KV cache, wasting memory and time.
This paper shows a simple, one-model way to dub videos that makes the new voice and the lips move together naturally.