🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers10

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#RLHF

Reinforced Attention Learning

Intermediate
Bangzheng Li, Jianmo Ni et al.Feb 4arXiv

This paper teaches AI to pay attention better by training its focus, not just its words.

#Reinforced Attention Learning#attention policy#multimodal LLM

Rethinking the Trust Region in LLM Reinforcement Learning

Intermediate
Penghui Qi, Xiangxin Zhou et al.Feb 4arXiv

The paper shows that the popular PPO method for training language models is unfair to rare words and too gentle with very common words, which makes learning slow and unstable.

#Reinforcement Learning#Proximal Policy Optimization#Trust Region

Are LLMs Vulnerable to Preference-Undermining Attacks (PUA)? A Factorial Analysis Methodology for Diagnosing the Trade-off between Preference Alignment and Real-World Validity

Intermediate
Hongjun An, Yiliang Song et al.Jan 10arXiv

The paper shows that friendly, people-pleasing language can trick even advanced language models into agreeing with wrong answers.

#Preference-Undermining Attacks#PUA#sycophancy

An Empirical Study on Preference Tuning Generalization and Diversity Under Domain Shift

Intermediate
Constantinos Karouzos, Xingwei Tan et al.Jan 9arXiv

Preference tuning teaches language models to act the way people like, but those habits can fall apart when the topic or style changes (domain shift).

#preference tuning#domain shift#supervised fine-tuning

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

Intermediate
Shih-Yang Liu, Xin Dong et al.Jan 8arXiv

When a model learns from many rewards at once, a popular method called GRPO can accidentally squash different reward mixes into the same learning signal, which confuses training.

#GDPO#GRPO#multi-reward reinforcement learning

The Reasoning-Creativity Trade-off: Toward Creativity-Driven Problem Solving

Intermediate
Max Ruiz Luyten, Mihaela van der SchaarJan 2arXiv

Modern AI models can get very good at being correct, but in the process they often lose their ability to think in many different ways.

#Distributional Creative Reasoning#diversity energy#creativity kernel

Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Intermediate
NVIDIA, : et al.Dec 23arXiv

Nemotron 3 Nano is a new open-source language model that mixes two brain styles (Mamba and Transformer) and adds a team of special experts (MoE) so it thinks better while running much faster.

#Mixture-of-Experts#Mamba-2#Transformer

Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models

Intermediate
Boxin Wang, Chankyu Lee et al.Dec 15arXiv

The paper introduces Nemotron-Cascade, a step-by-step (cascaded) reinforcement learning recipe that trains an AI across domains like alignment, instructions, math, coding, and software engineering—one at a time.

#Cascaded Reinforcement Learning#RLHF#Instruction-Following RL

Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model

Intermediate
Team Seedance, Heyi Chen et al.Dec 15arXiv

Seedance 1.5 pro is a single model that makes video and sound together at the same time, so lips, music, and actions match naturally.

#audio-visual generation#diffusion transformer#cross-modal synchronization

LongCat-Image Technical Report

Intermediate
Meituan LongCat Team, Hanghang Ma et al.Dec 8arXiv

LongCat-Image is a small (6B) but mighty bilingual image generator that turns text into high-quality, realistic pictures and can also edit images very well.

#LongCat-Image#diffusion model#text-to-image