Youtu-LLM is a small (1.96B) language model that was trained from scratch to think, plan, and act like an agent instead of just copying bigger models.
Big vision-language models are super smart but too large to fit on phones and small devices.