🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers1262

AllBeginnerIntermediateAdvanced
All SourcesarXiv

DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation

Intermediate
Yibo Wang, Lei Wang et al.Jan 14arXiv

The paper introduces DeepResearchEval, a fully automated way to build realistic deep research tasks and to grade long research reports from AI systems.

#deep research agents#agentic evaluation#persona-driven tasks

Not triaged yet

STEP3-VL-10B Technical Report

Beginner
Ailin Huang, Chengyuan Yao et al.Jan 14arXiv

STEP3-VL-10B is a small (10 billion parameters) open multimodal model that sees images and reads text, yet scores like much larger models.

#multimodal foundation model#unified pre-training#perception encoder

Not triaged yet

Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning

Intermediate
Zhiyuan Hu, Yunhai Hu et al.Jan 14arXiv

This paper introduces MATTRL, a way for multiple AI agents to learn from their own conversations at test time using short, reusable text notes instead of retraining their weights.

#multi-agent systems#test-time reinforcement learning#experience retrieval

Not triaged yet

PersonalAlign: Hierarchical Implicit Intent Alignment for Personalized GUI Agent with Long-Term User-Centric Records

Intermediate
Yibo Lyu, Gongwei Chen et al.Jan 14arXiv

The paper tackles a real-life problem: people often give phones short, vague instructions, so agents must guess the missing details using what they know about the user.

#personalized GUI agent#implicit intent#preference modeling

Not triaged yet

OpenVoxel: Training-Free Grouping and Captioning Voxels for Open-Vocabulary 3D Scene Understanding

Intermediate
Sheng-Yu Huang, Jaesung Choe et al.Jan 14arXiv

OpenVoxel is a training-free way to understand 3D scenes by grouping tiny 3D blocks (voxels) into objects and giving each object a clear caption.

#OpenVoxel#Sparse Voxel Rasterization#training-free 3D understanding

Not triaged yet

Omni-R1: Towards the Unified Generative Paradigm for Multimodal Reasoning

Intermediate
Dongjie Cheng, Yongqi Li et al.Jan 14arXiv

Omni-R1 teaches AI to think with pictures and words at the same time by drawing helpful mini-images while reasoning.

#multimodal reasoning#interleaved generation#functional image generation

Not triaged yet

V-DPM: 4D Video Reconstruction with Dynamic Point Maps

Intermediate
Edgar Sucar, Eldar Insafutdinov et al.Jan 14arXiv

V-DPM is a new way for AI to turn a short video into a moving 3D world, capturing both the shape and the motion of everything in it.

#Dynamic Point Maps#4D reconstruction#scene flow

Not triaged yet

EvoFSM: Controllable Self-Evolution for Deep Research with Finite State Machines

Intermediate
Shuo Zhang, Chaofa Yuan et al.Jan 14arXiv

EvoFSM is a way for AI agents to improve themselves safely by editing a clear flowchart (an FSM) instead of rewriting everything blindly.

#Finite State Machine#Structured Self-Evolution#Atomic Operations

Not triaged yet

MAXS: Meta-Adaptive Exploration with LLM Agents

Intermediate
Jian Zhang, Zhiyuan Wang et al.Jan 14arXiv

MAXS is a new way for AI agents to think a few steps ahead while using tools like search and code, so they make smarter choices.

#LLM agents#tool-augmented reasoning#lookahead

Not triaged yet

ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection

Beginner
Tao Liu, Taiqiang Wu et al.Jan 14arXiv

Traditional supervised fine-tuning (SFT) makes a model copy one answer too exactly, which can cause overfitting to the exact wording instead of the real idea.

#ProFit#Supervised Fine-Tuning#Token Probability

Not triaged yet

Geometric Stability: The Missing Axis of Representations

Intermediate
Prashant C. RajuJan 14arXiv

Similarity tells you if two models seem to think about things the same way, but it doesn’t tell you if that thinking is sturdy when the world wiggles.

#geometric stability#representation similarity#CKA

Not triaged yet

World Craft: Agentic Framework to Create Visualizable Worlds via Text

Intermediate
Jianwen Sun, Yukang Feng et al.Jan 14arXiv

World Craft lets anyone turn a short text description into a playable, visual game world without coding.

#AI Town#multi-agent framework#layout generation

Not triaged yet

6061626364