๐ŸŽ“How I Study AIHISA
๐Ÿ“–Read
๐Ÿ“„Papers๐Ÿ“ฐBlogs๐ŸŽฌCourses
๐Ÿ’กLearn
๐Ÿ›ค๏ธPaths๐Ÿ“šTopics๐Ÿ’กConcepts๐ŸŽดShorts
๐ŸŽฏPractice
๐Ÿ“Daily Log๐ŸŽฏPrompts๐Ÿง Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers3

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#online RL

DataChef: Cooking Up Optimal Data Recipes for LLM Adaptation via Reinforcement Learning

Intermediate
Yicheng Chen, Zerun Ma et al.Feb 11arXiv

DataChef teaches a large language model to be a smart data chef: it plans and codes full data pipelines that turn messy datasets into great training meals for other models.

#data recipe#data processing pipeline#reinforcement learning

Secure Code Generation via Online Reinforcement Learning with Vulnerability Reward Model

Intermediate
Tianyi Wu, Mingzhe Du et al.Feb 7arXiv

This paper introduces SecCoderX, a way to teach code-writing AIs to be secure without breaking what the code is supposed to do.

#secure code generation#reinforcement learning#vulnerability reward model

TRIP-Bench: A Benchmark for Long-Horizon Interactive Agents in Real-World Scenarios

Intermediate
Yuanzhe Shen, Zisu Huang et al.Feb 2arXiv

TRIP-Bench is a new test that checks if AI travel agents can plan real trips over many chat turns while following strict rules and changing user requests.

#TRIP-Bench#long-horizon agents#multi-turn interaction