🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers791

AllBeginnerIntermediateAdvanced
All SourcesarXiv

DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation

Intermediate
Yibo Wang, Lei Wang et al.Jan 14arXiv

The paper introduces DeepResearchEval, a fully automated way to build realistic deep research tasks and to grade long research reports from AI systems.

#deep research agents#agentic evaluation#persona-driven tasks

Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning

Intermediate
Zhiyuan Hu, Yunhai Hu et al.Jan 14arXiv

This paper introduces MATTRL, a way for multiple AI agents to learn from their own conversations at test time using short, reusable text notes instead of retraining their weights.

#multi-agent systems#test-time reinforcement learning#experience retrieval

PersonalAlign: Hierarchical Implicit Intent Alignment for Personalized GUI Agent with Long-Term User-Centric Records

Intermediate
Yibo Lyu, Gongwei Chen et al.Jan 14arXiv

The paper tackles a real-life problem: people often give phones short, vague instructions, so agents must guess the missing details using what they know about the user.

#personalized GUI agent#implicit intent#preference modeling

OpenVoxel: Training-Free Grouping and Captioning Voxels for Open-Vocabulary 3D Scene Understanding

Intermediate
Sheng-Yu Huang, Jaesung Choe et al.Jan 14arXiv

OpenVoxel is a training-free way to understand 3D scenes by grouping tiny 3D blocks (voxels) into objects and giving each object a clear caption.

#OpenVoxel#Sparse Voxel Rasterization#training-free 3D understanding

Omni-R1: Towards the Unified Generative Paradigm for Multimodal Reasoning

Intermediate
Dongjie Cheng, Yongqi Li et al.Jan 14arXiv

Omni-R1 teaches AI to think with pictures and words at the same time by drawing helpful mini-images while reasoning.

#multimodal reasoning#interleaved generation#functional image generation

V-DPM: 4D Video Reconstruction with Dynamic Point Maps

Intermediate
Edgar Sucar, Eldar Insafutdinov et al.Jan 14arXiv

V-DPM is a new way for AI to turn a short video into a moving 3D world, capturing both the shape and the motion of everything in it.

#Dynamic Point Maps#4D reconstruction#scene flow

EvoFSM: Controllable Self-Evolution for Deep Research with Finite State Machines

Intermediate
Shuo Zhang, Chaofa Yuan et al.Jan 14arXiv

EvoFSM is a way for AI agents to improve themselves safely by editing a clear flowchart (an FSM) instead of rewriting everything blindly.

#Finite State Machine#Structured Self-Evolution#Atomic Operations

MAXS: Meta-Adaptive Exploration with LLM Agents

Intermediate
Jian Zhang, Zhiyuan Wang et al.Jan 14arXiv

MAXS is a new way for AI agents to think a few steps ahead while using tools like search and code, so they make smarter choices.

#LLM agents#tool-augmented reasoning#lookahead

Geometric Stability: The Missing Axis of Representations

Intermediate
Prashant C. RajuJan 14arXiv

Similarity tells you if two models seem to think about things the same way, but it doesn’t tell you if that thinking is sturdy when the world wiggles.

#geometric stability#representation similarity#CKA

World Craft: Agentic Framework to Create Visualizable Worlds via Text

Intermediate
Jianwen Sun, Yukang Feng et al.Jan 14arXiv

World Craft lets anyone turn a short text description into a playable, visual game world without coding.

#AI Town#multi-agent framework#layout generation

EvasionBench: A Large-Scale Benchmark for Detecting Managerial Evasion in Earnings Call Q&A

Intermediate
Shijian Ma, Yan Lin et al.Jan 14arXiv

EvasionBench is a new, very large dataset that helps computers spot when company leaders dodge questions during earnings call Q&A.

#evasion detection#earnings call Q&A#financial NLP

SkinFlow: Efficient Information Transmission for Open Dermatological Diagnosis via Dynamic Visual Encoding and Staged RL

Intermediate
Lijun Liu, Linwei Chen et al.Jan 14arXiv

SkinFlow is a 7B-parameter vision–language model that diagnoses skin conditions by sending the most useful visual information to the language brain, instead of just getting bigger.

#dermatology AI#vision-language model#Dynamic Visual Encoding
2829303132