🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers6

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#AUROC

Accurate Failure Prediction in Agents Does Not Imply Effective Failure Prevention

Intermediate
Rakshith Vasudev, Melisa Russak et al.Feb 3arXiv

The paper shows that even if a model is great at predicting when an AI agent will fail, jumping in to “fix” the agent mid-task can still make things worse.

#LLM critic#execution-time intervention#disruption–recovery tradeoff

Agentic Uncertainty Quantification

Intermediate
Jiaxin Zhang, Prafulla Kumar Choubey et al.Jan 22arXiv

Long AI tasks can go wrong early and keep getting worse, like a snowball of mistakes called the Spiral of Hallucination.

#Agentic Uncertainty Quantification#Spiral of Hallucination#Dual-Process Architecture

NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems

Intermediate
Jiayu Liu, Rui Wang et al.Jan 16arXiv

The paper studies why large language models (LLMs) sound too sure of themselves when using retrieval-augmented generation (RAG) and how to fix it.

#Retrieval-Augmented Generation#Confidence Calibration#Expected Calibration Error

EpiCaR: Knowing What You Don't Know Matters for Better Reasoning in LLMs

Intermediate
Jewon Yeom, Jaewon Sok et al.Jan 11arXiv

This paper teaches AI models not just how to solve problems but also how to tell when their own answers might be wrong.

#EPICAR#calibration#epistemic uncertainty

Why LLMs Aren't Scientists Yet: Lessons from Four Autonomous Research Attempts

Intermediate
Dhruv Trehan, Paras ChopraJan 6arXiv

The authors built a simple six-agent system to see if today’s AI models could plan, run, and write a research paper mostly on their own.

#autonomous research pipeline#implementation drift#training data bias

Can LLMs Estimate Student Struggles? Human-AI Difficulty Alignment with Proficiency Simulation for Item Difficulty Prediction

Intermediate
Ming Li, Han Chen et al.Dec 21arXiv

This paper asks a simple question with big impact: Can AI tell which test questions are hard for humans?

#Item Difficulty Prediction#Item Response Theory#Rasch Model