🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers3

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#data-centric AI

MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric Methods

Intermediate
Honglin Lin, Zheng Liu et al.Jan 29arXiv

MMFineReason is a huge, open dataset (1.8 million examples, 5.1 billion solution tokens) that teaches AIs to think step by step about pictures and text together.

#multimodal reasoning#vision-language models#chain-of-thought

DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI

Intermediate
Hao Liang, Xiaochen Ma et al.Dec 18arXiv

DataFlow is a building-block system that helps large language models get better data by unifying how we create, clean, check, and organize that data.

#DataFlow#LLM data preparation#operator pipeline

OpenDataArena: A Fair and Open Arena for Benchmarking Post-Training Dataset Value

Intermediate
Mengzhang Cai, Xin Gao et al.Dec 16arXiv

OpenDataArena (ODA) is a fair, open platform that measures how valuable different post‑training datasets are for large language models by holding everything else constant.

#OpenDataArena#post-training datasets#data-centric AI