๐ŸŽ“How I Study AIHISA
๐Ÿ“–Read
๐Ÿ“„Papers๐Ÿ“ฐBlogs๐ŸŽฌCourses
๐Ÿ’กLearn
๐Ÿ›ค๏ธPaths๐Ÿ“šTopics๐Ÿ’กConcepts๐ŸŽดShorts
๐ŸŽฏPractice
๐ŸงฉProblems๐ŸŽฏPrompts๐Ÿง Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers17

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#curriculum learning

Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models

Intermediate
Junru Lu, Jiarui Qin et al.Dec 31arXiv

Youtu-LLM is a small (1.96B) language model that was trained from scratch to think, plan, and act like an agent instead of just copying bigger models.

#lightweight LLM#agentic mid-training#trajectory data

GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators

Intermediate
Jiacheng Guo, Ling Yang et al.Dec 22arXiv

GenEnv is a training system where a student AI and a teacher simulator grow together by exchanging tasks and feedback.

#GenEnv#co-evolutionary learning#difficulty-aligned curriculum

Puzzle Curriculum GRPO for Vision-Centric Reasoning

Intermediate
Ahmadreza Jeddi, Hakki Can Karaimer et al.Dec 16arXiv

This paper teaches vision-language models to reason about pictures using puzzles instead of expensive human labels.

#vision-language models#reinforcement learning#group-relative policy optimization

Achieving Olympia-Level Geometry Large Language Model Agent via Complexity Boosting Reinforcement Learning

Intermediate
Haiteng Zhao, Junhao Shen et al.Dec 11arXiv

This paper builds InternGeometry, a large language model agent that solves Olympiad-level geometry by talking to a math engine, remembering what worked, and trying smart new ideas.

#InternGeometry#geometry theorem proving#auxiliary constructions

From Imitation to Discrimination: Toward A Generalized Curriculum Advantage Mechanism Enhancing Cross-Domain Reasoning Tasks

Intermediate
Changpeng Yang, Jinyang Wu et al.Dec 2arXiv

This paper teaches AI models to reason better by first copying only good examples and later learning from mistakes too.

#Curriculum Advantage Policy Optimization#advantage-based RL#imitation learning
12