How I Study AI - Learn AI Papers & Lectures the Easy Way

DataChef: Cooking Up Optimal Data Recipes for LLM Adaptation via Reinforcement Learning

Intermediate

Yicheng Chen, Zerun Ma et al.Feb 11arXiv

DataChef teaches a large language model to be a smart data chef: it plans and codes full data pipelines that turn messy datasets into great training meals for other models.

#data recipe#data processing pipeline#reinforcement learning

Secure Code Generation via Online Reinforcement Learning with Vulnerability Reward Model

Intermediate

Tianyi Wu, Mingzhe Du et al.Feb 7arXiv

This paper introduces SecCoderX, a way to teach code-writing AIs to be secure without breaking what the code is supposed to do.

#secure code generation#reinforcement learning#vulnerability reward model

TRIP-Bench: A Benchmark for Long-Horizon Interactive Agents in Real-World Scenarios

Intermediate

Yuanzhe Shen, Zisu Huang et al.Feb 2arXiv

TRIP-Bench is a new test that checks if AI travel agents can plan real trips over many chat turns while following strict rules and changing user requests.

#TRIP-Bench#long-horizon agents#multi-turn interaction

Papers3

DataChef: Cooking Up Optimal Data Recipes for LLM Adaptation via Reinforcement Learning

Secure Code Generation via Online Reinforcement Learning with Vulnerability Reward Model

TRIP-Bench: A Benchmark for Long-Horizon Interactive Agents in Real-World Scenarios