🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers13

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#LLM-as-a-Judge

Learning Query-Specific Rubrics from Human Preferences for DeepResearch Report Generation

Intermediate
Changze Lv, Jie Zhou et al.Feb 3arXiv

DeepResearch agents write long, evidence-based reports, but teaching and grading them is hard because there is no single 'right answer' to score against.

#DeepResearch#query-specific rubrics#human preference learning

Quantifying the Gap between Understanding and Generation within Unified Multimodal Models

Intermediate
Chenlong Wang, Yuhang Chen et al.Feb 2arXiv

This paper shows that many AI models that both read images and write images are not truly unified inside—they often understand well but fail to generate (or the other way around).

#Unified Multimodal Models#GAPEVAL#Gap Score

Wiki Live Challenge: Challenging Deep Research Agents with Expert-Level Wikipedia Articles

Beginner
Shaohan Wang, Benfeng Xu et al.Feb 2arXiv

This paper builds a live challenge that tests how well Deep Research Agents (DRAs) can write expert-level Wikipedia-style articles.

#Deep Research Agents#Wikipedia Good Articles#Benchmark

Rethinking LLM-as-a-Judge: Representation-as-a-Judge with Small Language Models via Semantic Capacity Asymmetry

Intermediate
Zhuochun Li, Yong Zhang et al.Jan 30arXiv

Big models are often used to grade AI answers, but they are expensive, slow, and depend too much on tricky prompts.

#Representation-as-a-Judge#Semantic Capacity Asymmetry#LLM-as-a-Judge

CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty

Intermediate
Johannes Kirmayr, Lukas Stappen et al.Jan 29arXiv

CAR-bench is a new 'driving test' for AI assistants that checks if they can stay careful, honest, and consistent during real back-and-forth conversations in a car.

#LLM agents#benchmarking#consistency

MMDeepResearch-Bench: A Benchmark for Multimodal Deep Research Agents

Intermediate
Peizhou Huang, Zixuan Zhong et al.Jan 18arXiv

This paper introduces MMDeepResearch-Bench (MMDR-Bench), a new test that checks how well AI “deep research agents” write long, citation-rich reports using both text and images.

#Multimodal Deep Research#Benchmark#Citation Grounding

Agent-as-a-Judge

Beginner
Runyang You, Hongru Cai et al.Jan 8arXiv

This survey explains how AI judges are changing from single smart readers (LLM-as-a-Judge) into full-on agents that can plan, use tools, remember, and work in teams (Agent-as-a-Judge).

#Agent-as-a-Judge#LLM-as-a-Judge#multi-agent collaboration

KnowMe-Bench: Benchmarking Person Understanding for Lifelong Digital Companions

Intermediate
Tingyu Wu, Zhisheng Chen et al.Jan 8arXiv

KnowMe-Bench is a new test that checks if AI helpers truly understand a person, not just remember facts.

#person understanding#autobiographical narratives#cognitive stream

UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision

Beginner
Ruiyan Han, Zhen Fang et al.Jan 6arXiv

This paper fixes a common problem in multimodal AI: models can understand pictures and words well but stumble when asked to create matching images.

#Unified Multimodal Models#Self-Generated Supervision#Conduction Aphasia

Nested Browser-Use Learning for Agentic Information Seeking

Beginner
Baixuan Li, Jialong Wu et al.Dec 29arXiv

This paper teaches AI helpers to browse the web more like people do, not just by grabbing static snippets.

#information-seeking agents#browser-use#ReAct function-calling

SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents

Intermediate
Shaofei Cai, Yulei Qin et al.Dec 26arXiv

SmartSnap teaches an agent not only to finish a phone task but also to prove it with a few perfect snapshots it picks itself.

#Self-verifying agents#Evidence curation#3C principles

Masking Teacher and Reinforcing Student for Distilling Vision-Language Models

Intermediate
Byung-Kwan Lee, Yu-Chiang Frank Wang et al.Dec 23arXiv

Big vision-language models are super smart but too large to fit on phones and small devices.

#vision-language models#knowledge distillation#masking teacher
12