LaSER teaches a fast search model to “think” quietly inside its hidden space, so it gets the benefits of step-by-step reasoning without writing those steps out as text.
This paper teaches a language-model agent to explore smarter by combining two ways of learning (on-policy and off-policy) with a simple, self-written memory.
The paper teaches small AI models to make high‑quality text embeddings by first copying a big expert model (distillation) and then practicing four jobs with special mini‑modules (LoRA adapters): retrieval, similarity, clustering, and classification.
Big language models can get stuck after fine-tuning because they become too sure of themselves, so normal training stops helping.
Big idea: use a small, already-trained model to help a bigger model learn good habits early, so the big one trains faster and ends up smarter.
This paper teaches AI to pay attention better by training its focus, not just its words.
The paper studies how to teach a smaller language model using a bigger one by only focusing on the most useful bits instead of everything.
This paper shows how to turn a big Transformer model into a faster hybrid model that mixes attention and RNN layers using far less training data (about 2.3B tokens).
This paper tackles dataset distillation by giving a clear, math-backed way to keep only the most useful bits of data, so models can learn well from far fewer images.
The paper asks a simple question: Which step-by-step explanations from a teacher model actually help a student model learn to reason better?
This paper builds an AI agent, ML-Master 2.0, that can work on machine learning projects for a very long time without forgetting what matters.
LaViT is a new way to teach smaller vision-language models to look at the right parts of an image before they speak.