AnyDepth is a new, simple way for a computer to tell how far things are in a picture using just one image (monocular depth).
InfiniteVL is a vision-language model that mixes two ideas: local focus with Sliding Window Attention and long-term memory with a linear module called Gated DeltaNet.
Fairy2i turns any pre-trained real-valued Transformer layer into an exactly equivalent complex form, so nothing changes before quantization.