🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers1055

AllBeginnerIntermediateAdvanced
All SourcesarXiv

Phi-4-reasoning-vision-15B Technical Report

Intermediate
Jyoti Aneja, Michael Harrison et al.Mar 4arXiv

Phi-4-reasoning-vision-15B is a small, open-weight AI that understands pictures and text together and is especially good at math, science, and using computer screens.

#multimodal reasoning#vision-language model#mid-fusion

SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration

Intermediate
Jialong Chen, Xander Xu et al.Mar 4arXiv

SWE-CI is a new benchmark that tests how well AI coding agents can keep a codebase healthy over many changes, not just fix one bug.

#SWE-CI#continuous integration#code maintainability

T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning

Intermediate
Qinsi Wang, Hancheng Ye et al.Mar 4arXiv

This paper shows that teaching AI to first draw a simple map of a text (nodes and links) before answering questions makes it smarter and more reliable.

#Structure of Thought#Text-to-Structure#Intermediate Representation

MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Barrier

Intermediate
Zonglin Yang, Lidong BingMar 4arXiv

Scientists want AI to propose brand‑new hypotheses directly from a research background, but training a model to do this end‑to‑end is mathematically intractable because the search space explodes combinatorially.

#scientific discovery#hypothesis generation#P(h|b)

InfinityStory: Unlimited Video Generation with World Consistency and Character-Aware Shot Transitions

Intermediate
Mohamed Elmoghany, Liangbing Zhao et al.Mar 4arXiv

InfinityStory is a new system that can make very long videos (even hours) where the world stays the same and characters transition smoothly between shots.

#long-form video generation#background consistency#multi-agent planning

Utonia: Toward One Encoder for All Point Clouds

Intermediate
Yujia Zhang, Xiaoyang Wu et al.Mar 3arXiv

Utonia is a single brain (encoder) that learns from many kinds of 3D point clouds, like indoor rooms, outdoor streets, tiny toys, and even city maps.

#Utonia#point cloud#self-supervised learning

MIBURI: Towards Expressive Interactive Gesture Synthesis

Intermediate
M. Hamza Mughal, Rishabh Dabral et al.Mar 3arXiv

MIBURI is a system that makes a talking digital character move its body and face expressively in real time while it speaks.

#co-speech gesture synthesis#embodied conversational agents#causal generation

CFG-Ctrl: Control-Based Classifier-Free Diffusion Guidance

Intermediate
Hanyang Wang, Yiyang Liu et al.Mar 3arXiv

This paper turns a popular image-guidance trick (Classifier-Free Guidance) into a feedback-control problem, just like keeping a car steady in its lane.

#Classifier-Free Guidance#Sliding Mode Control#Diffusion Models

Beyond Language Modeling: An Exploration of Multimodal Pretraining

Intermediate
Shengbang Tong, David Fan et al.Mar 3arXiv

The paper trains one model from scratch to both read text and see images/videos, instead of starting from a language-only model.

#multimodal pretraining#representation autoencoder#RAE

UniG2U-Bench: Do Unified Models Advance Multimodal Understanding?

Intermediate
Zimo Wen, Boxiu Li et al.Mar 3arXiv

This paper builds UniG2U-Bench, a big test to find out when making pictures (generation) actually helps models understand pictures and text together.

#Unified multimodal models#Vision-language models#Generation-to-Understanding (G2U)

Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use

Intermediate
Aradhye Agarwal, Gurdit Siyan et al.Mar 3arXiv

Agentic AIs don’t just chat; they plan, use tools, and take many steps, so one wrong click can cause real harm.

#MOSAIC#agentic safety#plan-check-act

Code2Math: Can Your Code Agent Effectively Evolve Math Problems Through Exploration?

Intermediate
Dadi Guo, Yuejin Xie et al.Mar 3arXiv

This paper shows that code-writing AI agents can take an existing math problem and automatically turn it into a new, harder one while keeping it solvable.

#code agents#multi-agent systems#mathematical reasoning
12345