🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers196

AllBeginnerIntermediateAdvanced
All SourcesarXiv

Agentic Confidence Calibration

Beginner
Jiaxin Zhang, Caiming Xiong et al.Jan 22arXiv

AI agents often act very sure of themselves even when they are wrong, especially on long, multi-step tasks.

#agentic confidence calibration#holistic trajectory calibration#general agent calibrator

Dancing in Chains: Strategic Persuasion in Academic Rebuttal via Theory of Mind

Beginner
Zhitao He, Zongwei Lyu et al.Jan 22arXiv

Academic rebuttals are not just about being polite; they are about smart, strategic persuasion under hidden information.

#academic rebuttal#theory of mind#strategic persuasion

Robust Tool Use via Fission-GRPO: Learning to Recover from Execution Errors

Beginner
Zhiwei Zhang, Fei Zhao et al.Jan 22arXiv

Small AI models often stumble when a tool call fails and then get stuck repeating bad calls instead of fixing the mistake.

#FISSION-GRPO#error recovery#tool use

Rethinking Video Generation Model for the Embodied World

Beginner
Yufan Deng, Zilin Pan et al.Jan 21arXiv

Robots need videos that not only look pretty but also follow real-world physics and finish the task asked of them.

#robot video generation#embodied AI#benchmark

The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models

Beginner
Zanlin Ni, Shenzhi Wang et al.Jan 21arXiv

Diffusion language models can write tokens in any order, but that freedom can accidentally hurt their ability to reason well.

#diffusion language model#arbitrary order generation#autoregressive training

Typhoon OCR: Open Vision-Language Model For Thai Document Extraction

Beginner
Surapon Nonesung, Natapong Nitarach et al.Jan 21arXiv

Typhoon OCR is an open, lightweight vision-language model that reads Thai and English documents and returns clean, structured text.

#Thai OCR#Vision-Language Model#Document Layout Reconstruction

FARE: Fast-Slow Agentic Robotic Exploration

Beginner
Shuhao Liao, Xuxin Lv et al.Jan 21arXiv

Robots used to explore by following simple rules or short-term rewards, which often made them waste time and backtrack a lot.

#autonomous exploration#fast-slow thinking#hierarchical planning

XR: Cross-Modal Agents for Composed Image Retrieval

Beginner
Zhongyu Yang, Wei Pang et al.Jan 20arXiv

XR is a new, training-free team of AI helpers that finds images using both a reference picture and a short text edit (like “same jacket but red”).

#Composed Image Retrieval#cross-modal reasoning#multi-agent system

PRiSM: Benchmarking Phone Realization in Speech Models

Beginner
Shikhar Bharadwaj, Chin-Jou Li et al.Jan 20arXiv

PRiSM is a new open-source benchmark that checks how well speech models hear and write down tiny speech sounds called phones.

#phone recognition#phonetic transcription#PFER

Think3D: Thinking with Space for Spatial Reasoning

Beginner
Zaibin Zhang, Yuhan Wu et al.Jan 19arXiv

Think3D lets AI models stop guessing from flat pictures and start exploring real 3D space, like walking around a room in a video game.

#Think3D#spatial reasoning#3D reconstruction

Scientific Image Synthesis: Benchmarking, Methodologies, and Downstream Utility

Beginner
Honglin Lin, Chonghan Qin et al.Jan 17arXiv

The paper studies how to make and judge scientific images that are not just pretty but scientifically correct.

#scientific image synthesis#text-to-image (T2I)#programmatic diagram generation

MemoryRewardBench: Benchmarking Reward Models for Long-Term Memory Management in Large Language Models

Beginner
Zecheng Tang, Baibei Ji et al.Jan 17arXiv

This paper builds MemoryRewardBench, a big test that checks if reward models (AI judges) can fairly grade how other AIs manage long-term memory, not just whether their final answers are right.

#reward models#long-term memory#long-context reasoning
7891011