LTX-2: Efficient Joint Audio-Visual Foundation Model
IntermediateYoav HaCohen, Benny Brazowski et al.Jan 6arXiv
LTX-2 is an open-source model that makes video and sound together from a text prompt, so the picture and audio match in time and meaning.
#text-to-video#text-to-audio#audiovisual generation