This paper shows, step by step, how to train a 1.36-billion-parameter science-focused language model directly from raw arXiv LaTeX files using only 2 A100 GPUs.
LTX-2 is an open-source model that makes video and sound together from a text prompt, so the picture and audio match in time and meaning.