How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers8

All Beginner Intermediate Advanced

All Sources arXiv

#pass@1

Tool Verification for Test-Time Reinforcement Learning

Ruotong Liao, Nikolai Röhrich et al.Mar 2arXiv

The paper fixes a big flaw in test-time reinforcement learning (TTRL): when many wrong answers agree, the model rewards the mistake and gets stuck.

#test-time reinforcement learning#verification-weighted voting#tool verification

Not triaged yet

TAROT: Test-driven and Capability-adaptive Curriculum Reinforcement Fine-tuning for Code Generation with Large Language Models

Chansung Park, Juyong Jiang et al.Feb 17arXiv

TAROT teaches code-writing AI the way good teachers teach kids: start at the right level and raise the bar at the right time.

#TAROT#curriculum learning#reinforcement fine-tuning

Not triaged yet

Gaia2: Benchmarking LLM Agents on Dynamic and Asynchronous Environments

Romain Froger, Pierre Andrews et al.Feb 12arXiv

Gaia2 is a new test that measures how well AI agents handle real-life messiness like changing events, deadlines, and team coordination.

#Gaia2#ARE platform#asynchronous environments

Not triaged yet

GameDevBench: Evaluating Agentic Capabilities Through Game Development

Wayne Chi, Yixiong Fang et al.Feb 11arXiv

GameDevBench is a new test that checks if AI agents can actually make parts of video games, not just write code in one file.

#GameDevBench#Godot#multimodal agents

Not triaged yet

Does Your Reasoning Model Implicitly Know When to Stop Thinking?

Zixuan Huang, Xin Xia et al.Feb 9arXiv

Big AI reasoning models often keep thinking long after they already found the right answer, wasting time and tokens.

#SAGE#efficient reasoning#chain of thought

Not triaged yet

Learning to Repair Lean Proofs from Compiler Feedback

Evan Wang, Simon Chess et al.Feb 3arXiv

This paper teaches AI how to fix broken Lean math proofs by learning from the compiler’s feedback, not just from finished, perfect proofs.

#Lean proof repair#compiler feedback#APRIL dataset

Not triaged yet

Nemotron-Math: Efficient Long-Context Distillation of Mathematical Reasoning from Multi-Mode Supervision

Wei Du, Shubham Toshniwal et al.Dec 17arXiv

Nemotron-Math is a giant math dataset with 7.5 million step-by-step solutions created in three thinking styles and with or without Python help.

#mathematical reasoning#long-context fine-tuning#multi-mode supervision

Not triaged yet

ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding

Jia-Nan Li, Jian Guan et al.Dec 15arXiv

ReFusion is a new way for AI to write text faster by planning in chunks (called slots) and then filling each chunk carefully.

#ReFusion#masked diffusion model#parallel decoding

Not triaged yet