🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers34

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#supervised fine-tuning

MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric Methods

Intermediate
Honglin Lin, Zheng Liu et al.Jan 29arXiv

MMFineReason is a huge, open dataset (1.8 million examples, 5.1 billion solution tokens) that teaches AIs to think step by step about pictures and text together.

#multimodal reasoning#vision-language models#chain-of-thought

ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas

Intermediate
Xiaoyu Tian, Haotian Wang et al.Jan 29arXiv

ASTRA is a fully automated way to train tool-using AI agents by making both their practice stories (trajectories) and their practice worlds (environments) without humans in the loop.

#tool-augmented agents#multi-turn decision making#verifiable environments

SERA: Soft-Verified Efficient Repository Agents

Intermediate
Ethan Shen, Danny Tormoen et al.Jan 28arXiv

SERA is a new, low-cost way to train coding helpers (agents) that learn the style and secrets of your own codebase.

#SERA#Soft-Verified Generation#soft verification

OmegaUse: Building a General-Purpose GUI Agent for Autonomous Task Execution

Intermediate
Le Zhang, Yixiong Xiao et al.Jan 28arXiv

OmegaUse is a new AI that can use phones and computers by looking at screenshots and deciding where to click, type, or scroll—much like a careful human user.

#GUI agent#UI grounding#navigation policy

Towards Pixel-Level VLM Perception via Simple Points Prediction

Intermediate
Tianhui Song, Haoyu Lu et al.Jan 27arXiv

SimpleSeg teaches a multimodal language model to outline objects by writing down a list of points, like connecting the dots, instead of using a special segmentation decoder.

#SimpleSeg#multimodal large language model#decoder-free segmentation

daVinci-Dev: Agent-native Mid-training for Software Engineering

Intermediate
Ji Zeng, Dayuan Fu et al.Jan 26arXiv

This paper teaches code AIs to work more like real software engineers by training them in the middle of their learning using real development workflows.

#agentic mid-training#agent-native data#contextually-native trajectories

Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification

Intermediate
Yuxuan Wan, Tianqing Fang et al.Jan 22arXiv

DeepVerifier is a plug-in checker that helps Deep Research Agents catch and fix their own mistakes while they are working, without retraining.

#Deep Research Agents#verification asymmetry#rubrics-based feedback

Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment

Intermediate
Yuming Yang, Mingyoung Lai et al.Jan 20arXiv

The paper asks a simple question: Which step-by-step explanations from a teacher model actually help a student model learn to reason better?

#Rank-Surprisal Ratio#data-student suitability#chain-of-thought distillation

TranslateGemma Technical Report

Intermediate
Mara Finkelstein, Isaac Caswell et al.Jan 13arXiv

TranslateGemma is a family of open machine translation models fine-tuned from Gemma 3 to translate many languages more accurately.

#machine translation#TranslateGemma#Gemma 3

X-Coder: Advancing Competitive Programming with Fully Synthetic Tasks, Solutions, and Tests

Intermediate
Jie Wu, Haoling Li et al.Jan 11arXiv

X-Coder shows that models can learn expert-level competitive programming using data that is 100% synthetic—no real contest problems needed.

#competitive programming#synthetic data generation#feature-based synthesis

An Empirical Study on Preference Tuning Generalization and Diversity Under Domain Shift

Intermediate
Constantinos Karouzos, Xingwei Tan et al.Jan 9arXiv

Preference tuning teaches language models to act the way people like, but those habits can fall apart when the topic or style changes (domain shift).

#preference tuning#domain shift#supervised fine-tuning

EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis

Intermediate
Xiaoshuai Song, Haofei Chang et al.Jan 9arXiv

EnvScaler is an automatic factory that builds many safe, rule-following practice worlds where AI agents can talk to users and call tools, just like real apps.

#EnvScaler#tool-interactive environments#programmatic synthesis
123