Papers11

#reward modeling

CharacterFlywheel: Scaling Iterative Improvement of Engaging and Steerable LLMs in Production

CharacterFlywheel is a step‑by‑step loop that steadily improves chatty AI characters by learning from real conversations on Instagram, WhatsApp, and Messenger.

#CharacterFlywheel#large language models#conversational AI

Not triaged yet

Agentic Code Reasoning

Intermediate

Shubham Ugare, Satish ChandraMar 2arXiv

The paper teaches AI agents to understand big codebases without running the code by following a strict, step-by-step thinking template called semi-formal reasoning.

#agentic code reasoning#semi-formal reasoning#patch equivalence

Not triaged yet

Enhancing Spatial Understanding in Image Generation via Reward Modeling

Intermediate

Zhenyu Tang, Chaoran Feng et al.Feb 27arXiv

This paper teaches image generators to place objects in the right spots by building a special teacher called a reward model focused on spatial relationships.

#spatial reasoning#reward modeling#preference learning

Not triaged yet

RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System

Beginner

Yinjie Wang, Tianbao Xie et al.Feb 2arXiv

RLAnything is a new reinforcement learning (RL) framework that trains three things together at once: the policy (the agent), the reward model (the judge), and the environment (the tasks).

#reinforcement learning#closed-loop optimization#reward modeling

Not triaged yet

PISCES: Annotation-free Text-to-Video Post-Training via Optimal Transport-Aligned Rewards

Intermediate

Minh-Quan Le, Gaurav Mittal et al.Feb 2arXiv

This paper shows how to make text-to-video models create clearer, steadier, and more on-topic videos without using any human-labeled ratings.

#text-to-video#optimal transport#annotation-free

Not triaged yet

One Adapts to Any: Meta Reward Modeling for Personalized LLM Alignment

Intermediate

Hongru Cai, Yongqi Li et al.Jan 26arXiv

Large language models often learn one-size-fits-all preferences, but people are different, so we need personalization.

#personalized alignment#reward modeling#meta-learning

Not triaged yet

RoboBrain 2.5: Depth in Sight, Time in Mind

Intermediate

Huajie Tan, Enshen Zhou et al.Jan 20arXiv

RoboBrain 2.5 teaches robots to see depth precisely and to keep track of time-aware progress, so plans turn into safe, accurate actions.

#Embodied AI#3D spatial reasoning#metric grounding

Not triaged yet

RubricHub: A Comprehensive and Highly Discriminative Rubric Dataset via Automated Coarse-to-Fine Generation

Intermediate

Sunzhu Li, Jiale Zhao et al.Jan 13arXiv

RubricHub is a huge (about 110,000) collection of detailed grading guides (rubrics) for many kinds of questions like health, science, writing, and chat.

#RubricHub#coarse-to-fine rubric generation#multi-model aggregation

Not triaged yet

LSRIF: Logic-Structured Reinforcement Learning for Instruction Following

Intermediate

Qingyu Ren, Qianyu He et al.Jan 10arXiv

Real instructions often have logic like and first-then and if-else and this paper teaches models to notice and obey that logic.

#instruction following#logical structures#parallel constraints

Not triaged yet

ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement

Intermediate

Zhihang Liu, Xiaoyi Bao et al.Dec 15arXiv

ShowTable is a new way for AI to turn a data table into a beautiful, accurate infographic using a think–make–check–fix loop.

#creative table visualization#multimodal large language model#diffusion model

Not triaged yet

EditThinker: Unlocking Iterative Reasoning for Any Image Editor

Intermediate

Hongyu Li, Manyuan Zhang et al.Dec 5arXiv

EditThinker is a helper brain for any image editor that thinks, checks, and rewrites the instruction in multiple rounds until the picture looks right.

#instruction-based image editing#iterative reasoning#multimodal large language model

Not triaged yet