🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers9

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#DPO

SLIME: Stabilized Likelihood Implicit Margin Enforcement for Preference Optimization

Intermediate
Maksim Afanasyev, Illarion IovFeb 2arXiv

SLIME is a new way to train chatbots so they follow human preferences without forgetting how to write well.

#SLIME#preference optimization#DPO

Latent Adversarial Regularization for Offline Preference Optimization

Intermediate
Enyi Jiang, Yibo Jacky Zhang et al.Jan 29arXiv

This paper introduces GANPO, a new way to train language models from human preferences by guiding the model using its hidden thoughts (latent space) instead of just its visible words (token space).

#GANPO#latent space regularization#offline preference optimization

Qwen3-TTS Technical Report

Intermediate
Hangrui Hu, Xinfa Zhu et al.Jan 22arXiv

Qwen3-TTS is a family of text-to-speech models that can talk in 10+ languages, clone a new voice from just 3 seconds, and follow detailed style instructions in real time.

#Qwen3-TTS#text-to-speech#voice cloning

An Empirical Study on Preference Tuning Generalization and Diversity Under Domain Shift

Intermediate
Constantinos Karouzos, Xingwei Tan et al.Jan 9arXiv

Preference tuning teaches language models to act the way people like, but those habits can fall apart when the topic or style changes (domain shift).

#preference tuning#domain shift#supervised fine-tuning

Token-Level LLM Collaboration via FusionRoute

Intermediate
Nuoya Xiong, Yuhang Zhou et al.Jan 8arXiv

Big all-in-one language models are powerful but too expensive to run everywhere, while small specialists are cheaper but narrow.

#FusionRoute#token-level collaboration#expert routing

DiffCoT: Diffusion-styled Chain-of-Thought Reasoning in LLMs

Intermediate
Shidong Cao, Hongzhan Lin et al.Jan 7arXiv

DiffCoT treats a model’s step-by-step thinking (Chain-of-Thought) like a messy draft that can be cleaned up over time, not something fixed forever.

#Chain-of-Thought#Diffusion models#Autoregressive decoding

The Reasoning-Creativity Trade-off: Toward Creativity-Driven Problem Solving

Intermediate
Max Ruiz Luyten, Mihaela van der SchaarJan 2arXiv

Modern AI models can get very good at being correct, but in the process they often lose their ability to think in many different ways.

#Distributional Creative Reasoning#diversity energy#creativity kernel

Factorized Learning for Temporally Grounded Video-Language Models

Intermediate
Wenzheng Zeng, Difei Gao et al.Dec 30arXiv

This paper teaches video-language models to first find when the proof happens in a video and then answer with that proof, instead of mixing both steps together.

#temporal grounding#video-language models#evidence tokens

Kling-Omni Technical Report

Intermediate
Kling Team, Jialu Chen et al.Dec 18arXiv

Kling-Omni is a single, unified model that can understand text, images, and videos together and then make or edit high-quality videos from those mixed instructions.

#multimodal visual language#MVL#prompt enhancer