🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers4

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#Direct Preference Optimization (DPO)

Alternating Reinforcement Learning for Rubric-Based Reward Modeling in Non-Verifiable LLM Post-Training

Intermediate
Ran Xu, Tianci Liu et al.Feb 2arXiv

The paper introduces Rubric-ARM, a system that teaches two AI helpers—a rubric maker and a judge—to learn together using reinforcement learning so they can better decide which answers people would prefer.

#Rubric-based reward modeling#LLM-as-a-judge#Alternating reinforcement learning

GameTalk: Training LLMs for Strategic Conversation

Intermediate
Victor Conchello Vendrell, Max Ruiz Luyten et al.Jan 22arXiv

Large language models usually get judged one message at a time, but many real tasks need smart planning across a whole conversation.

#strategic conversation#reinforcement learning for LLMs#multi-turn dialogue

DreaMontage: Arbitrary Frame-Guided One-Shot Video Generation

Intermediate
Jiawei Liu, Junqiao Li et al.Dec 24arXiv

DreaMontage is a new AI method that makes long, single-shot videos that feel smooth and connected, even when you give it scattered images or short clips in the middle.

#arbitrary frame conditioning#one-shot video generation#Diffusion Transformer

T-pro 2.0: An Efficient Russian Hybrid-Reasoning Model and Playground

Intermediate
Dmitrii Stoianov, Danil Taranets et al.Dec 11arXiv

T-pro 2.0 is an open Russian language model that can answer quickly or think step by step, so you can pick speed or accuracy when you need it.

#T-pro 2.0#Russian LLM#Hybrid reasoning