🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers8

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#reward modeling

RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System

Beginner
Yinjie Wang, Tianbao Xie et al.Feb 2arXiv

RLAnything is a new reinforcement learning (RL) framework that trains three things together at once: the policy (the agent), the reward model (the judge), and the environment (the tasks).

#reinforcement learning#closed-loop optimization#reward modeling

PISCES: Annotation-free Text-to-Video Post-Training via Optimal Transport-Aligned Rewards

Intermediate
Minh-Quan Le, Gaurav Mittal et al.Feb 2arXiv

This paper shows how to make text-to-video models create clearer, steadier, and more on-topic videos without using any human-labeled ratings.

#text-to-video#optimal transport#annotation-free

One Adapts to Any: Meta Reward Modeling for Personalized LLM Alignment

Intermediate
Hongru Cai, Yongqi Li et al.Jan 26arXiv

Large language models often learn one-size-fits-all preferences, but people are different, so we need personalization.

#personalized alignment#reward modeling#meta-learning

RoboBrain 2.5: Depth in Sight, Time in Mind

Intermediate
Huajie Tan, Enshen Zhou et al.Jan 20arXiv

RoboBrain 2.5 teaches robots to see depth precisely and to keep track of time-aware progress, so plans turn into safe, accurate actions.

#Embodied AI#3D spatial reasoning#metric grounding

RubricHub: A Comprehensive and Highly Discriminative Rubric Dataset via Automated Coarse-to-Fine Generation

Intermediate
Sunzhu Li, Jiale Zhao et al.Jan 13arXiv

RubricHub is a huge (about 110,000) collection of detailed grading guides (rubrics) for many kinds of questions like health, science, writing, and chat.

#RubricHub#coarse-to-fine rubric generation#multi-model aggregation

LSRIF: Logic-Structured Reinforcement Learning for Instruction Following

Intermediate
Qingyu Ren, Qianyu He et al.Jan 10arXiv

Real instructions often have logic like and first-then and if-else and this paper teaches models to notice and obey that logic.

#instruction following#logical structures#parallel constraints

ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement

Intermediate
Zhihang Liu, Xiaoyi Bao et al.Dec 15arXiv

ShowTable is a new way for AI to turn a data table into a beautiful, accurate infographic using a think–make–check–fix loop.

#creative table visualization#multimodal large language model#diffusion model

EditThinker: Unlocking Iterative Reasoning for Any Image Editor

Intermediate
Hongyu Li, Manyuan Zhang et al.Dec 5arXiv

EditThinker is a helper brain for any image editor that thinks, checks, and rewrites the instruction in multiple rounds until the picture looks right.

#instruction-based image editing#iterative reasoning#multimodal large language model