🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers10

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#reward modeling

CharacterFlywheel: Scaling Iterative Improvement of Engaging and Steerable LLMs in Production

Intermediate
Yixin Nie, Lin Guan et al.Mar 2arXiv

CharacterFlywheel is a step‑by‑step loop that steadily improves chatty AI characters by learning from real conversations on Instagram, WhatsApp, and Messenger.

#CharacterFlywheel#large language models#conversational AI

Agentic Code Reasoning

Intermediate
Shubham Ugare, Satish ChandraMar 2arXiv

The paper teaches AI agents to understand big codebases without running the code by following a strict, step-by-step thinking template called semi-formal reasoning.

#agentic code reasoning#semi-formal reasoning#patch equivalence

Enhancing Spatial Understanding in Image Generation via Reward Modeling

Intermediate
Zhenyu Tang, Chaoran Feng et al.Feb 27arXiv

This paper teaches image generators to place objects in the right spots by building a special teacher called a reward model focused on spatial relationships.

#spatial reasoning#reward modeling#preference learning

PISCES: Annotation-free Text-to-Video Post-Training via Optimal Transport-Aligned Rewards

Intermediate
Minh-Quan Le, Gaurav Mittal et al.Feb 2arXiv

This paper shows how to make text-to-video models create clearer, steadier, and more on-topic videos without using any human-labeled ratings.

#text-to-video#optimal transport#annotation-free

One Adapts to Any: Meta Reward Modeling for Personalized LLM Alignment

Intermediate
Hongru Cai, Yongqi Li et al.Jan 26arXiv

Large language models often learn one-size-fits-all preferences, but people are different, so we need personalization.

#personalized alignment#reward modeling#meta-learning

RoboBrain 2.5: Depth in Sight, Time in Mind

Intermediate
Huajie Tan, Enshen Zhou et al.Jan 20arXiv

RoboBrain 2.5 teaches robots to see depth precisely and to keep track of time-aware progress, so plans turn into safe, accurate actions.

#Embodied AI#3D spatial reasoning#metric grounding

RubricHub: A Comprehensive and Highly Discriminative Rubric Dataset via Automated Coarse-to-Fine Generation

Intermediate
Sunzhu Li, Jiale Zhao et al.Jan 13arXiv

RubricHub is a huge (about 110,000) collection of detailed grading guides (rubrics) for many kinds of questions like health, science, writing, and chat.

#RubricHub#coarse-to-fine rubric generation#multi-model aggregation

LSRIF: Logic-Structured Reinforcement Learning for Instruction Following

Intermediate
Qingyu Ren, Qianyu He et al.Jan 10arXiv

Real instructions often have logic like and first-then and if-else and this paper teaches models to notice and obey that logic.

#instruction following#logical structures#parallel constraints

ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement

Intermediate
Zhihang Liu, Xiaoyi Bao et al.Dec 15arXiv

ShowTable is a new way for AI to turn a data table into a beautiful, accurate infographic using a think–make–check–fix loop.

#creative table visualization#multimodal large language model#diffusion model

EditThinker: Unlocking Iterative Reasoning for Any Image Editor

Intermediate
Hongyu Li, Manyuan Zhang et al.Dec 5arXiv

EditThinker is a helper brain for any image editor that thinks, checks, and rewrites the instruction in multiple rounds until the picture looks right.

#instruction-based image editing#iterative reasoning#multimodal large language model