🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers1055

AllBeginnerIntermediateAdvanced
All SourcesarXiv

OmniGAIA: Towards Native Omni-Modal AI Agents

Intermediate
Xiaoxi Li, Wenxiang Jiao et al.Feb 26arXiv

OmniGAIA is a new test that checks if AI can watch videos, look at images, listen to audio, and use web and code tools in several steps to find a verified answer.

#OmniGAIA#OmniAtlas#Tool-Integrated Reasoning

From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models

Intermediate
Hongrui Jia, Chaoya Jiang et al.Feb 26arXiv

Large multimodal models (LMMs) can look at pictures and read text, but they still miss tricky cases, like tiny chart labels or multi-step math.

#Large Multimodal Models#Diagnostic-driven Progressive Evolution#Reinforcement Learning

Towards Simulating Social Media Users with LLMs: Evaluating the Operational Validity of Conditioned Comment Prediction

Intermediate
Nils Schwager, Simon Münker et al.Feb 26arXiv

This paper tests whether AI can realistically guess what a specific social media user would comment when they see a new post.

#Conditioned Comment Prediction#LLM user simulation#implicit conditioning

Search More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and Generalization

Intermediate
Qianben Chen, Tianrui Qin et al.Feb 26arXiv

This paper shows that letting an AI search many places at the same time (in parallel) can beat making it think in long, slow chains.

#agentic search#parallel evidence acquisition#plan refinement

dLLM: Simple Diffusion Language Modeling

Intermediate
Zhanhui Zhou, Lingjie Chen et al.Feb 26arXiv

dLLM is a single, open-source toolbox that standardizes how diffusion language models are trained, run, and tested.

#diffusion language models#masked diffusion#block diffusion

Transformers converge to invariant algorithmic cores

Intermediate
Joshua S. SchiffmanFeb 26arXiv

Different transformers may have very different weights, but they often hide the same tiny "engine" inside that actually does the task.

#algorithmic cores#mechanistic interpretability#transformers

Causal Motion Diffusion Models for Autoregressive Motion Generation

Intermediate
Qing Yu, Akihisa Watanabe et al.Feb 26arXiv

The paper introduces CMDM, a new way to make computer-generated human motions that feel smooth over time and match the meaning of a text prompt.

#causal diffusion#autoregressive motion generation#text-to-motion

veScale-FSDP: Flexible and High-Performance FSDP at Scale

Intermediate
Zezhou Wang, Youjie Li et al.Feb 25arXiv

This paper makes training giant AI models faster and lighter on memory by inventing a new way to split tensors called RaggedShard.

#FSDP#ZeRO#RaggedShard

Solaris: Building a Multiplayer Video World Model in Minecraft

Intermediate
Georgy Savva, Oscar Michel et al.Feb 25arXiv

Solaris is a new AI that can imagine the future videos of two Minecraft players at the same time, keeping both cameras consistent with each other.

#multiplayer world model#video diffusion transformer#Minecraft dataset

Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets

Intermediate
Hanna Yukhymenko, Anton Alexandrov et al.Feb 25arXiv

The paper builds an automated pipeline that translates AI benchmarks and datasets into many languages while keeping questions and answers correctly connected.

#machine translation#multilingual benchmarks#test-time compute scaling

GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL

Intermediate
Rui Yang, Qianhui Wu et al.Feb 25arXiv

GUI-Libra is a training recipe that helps computer-using AI agents both think carefully and click precisely on screens.

#GUI agent#visual grounding#long-horizon navigation

World Guidance: World Modeling in Condition Space for Action Generation

Intermediate
Yue Su, Sijin Chen et al.Feb 25arXiv

WoG (World Guidance) teaches a robot to imagine just the right bits of the near future and use those bits to pick better actions.

#Vision-Language-Action#world modeling#condition space
678910