Papers38

#retrieval-augmented generation

MemGovern: Enhancing Code Agents through Learning from Governed Human Experiences

Qihao Wang, Ziming Cheng et al.Jan 11arXiv

MemGovern teaches code agents to learn from past human fixes on GitHub by turning messy discussions into clean, reusable 'experience cards.'

#MemGovern#experience governance#agentic search

Illusions of Confidence? Diagnosing LLM Truthfulness via Neighborhood Consistency

Beginner

Haoming Xu, Ningyuan Zhao et al.Jan 9arXiv

LLMs can look confident but still change their answers when the surrounding text nudges them, showing that confidence alone isn’t real truthfulness.

#Neighbor-Consistency Belief#belief robustness#self-consistency

Controllable Memory Usage: Balancing Anchoring and Innovation in Long-Term Human-Agent Interaction

Beginner

Muzhao Tian, Zisu Huang et al.Jan 8arXiv

Long-term AI helpers remember past chats, but using all memories can trap them in old ideas (Memory Anchoring).

#steerable memory#memory anchoring#long-term agents

KnowMe-Bench: Benchmarking Person Understanding for Lifelong Digital Companions

Intermediate

Tingyu Wu, Zhisheng Chen et al.Jan 8arXiv

KnowMe-Bench is a new test that checks if AI helpers truly understand a person, not just remember facts.

#person understanding#autobiographical narratives#cognitive stream

What Users Leave Unsaid: Under-Specified Queries Limit Vision-Language Models

Beginner

Dasol Choi, Guijin Son et al.Jan 7arXiv

Real people often ask vague questions with pictures, and today’s vision-language models (VLMs) struggle with them.

#vision-language models#under-specified queries#query explicitation

COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs

Intermediate

Dasol Choi, DongGeon Lee et al.Jan 5arXiv

COMPASS is a new framework that turns a company’s rules into thousands of smart test questions to check if chatbots follow those rules.

#policy alignment#allowlist denylist#enterprise AI safety

Scaling Open-Ended Reasoning to Predict the Future

Intermediate

Nikhil Chandak, Shashwat Goel et al.Dec 31arXiv

The paper teaches small language models to predict open-ended future events by turning daily news into thousands of safe, graded practice questions.

#open-ended forecasting#calibrated prediction#Brier score

AI Meets Brain: Memory Systems from Cognitive Neuroscience to Autonomous Agents

Intermediate

Jiafeng Liang, Hao Li et al.Dec 29arXiv

This survey links how human brains remember things to how AI agents should remember things so they can act smarter over time.

#agent memory#episodic memory#semantic memory

FaithLens: Detecting and Explaining Faithfulness Hallucination

Intermediate

Shuzheng Si, Qingyi Wang et al.Dec 23arXiv

Large language models can say things that sound right but aren’t supported by the given document; this is called a faithfulness hallucination.

#faithfulness hallucination#hallucination detection#explainable AI

Does It Tie Out? Towards Autonomous Legal Agents in Venture Capital

Intermediate

Pierre Colombo, Malik Boudiaf et al.Dec 21arXiv

Capitalization tie-out checks if a company’s ownership table truly matches what its legal documents say.

#capitalization tie-out#dataroom#cap table verification

Adaptation of Agentic AI

Intermediate

Pengcheng Jiang, Jiacheng Lin et al.Dec 18arXiv

This paper organizes how AI agents learn and improve into one simple map with four roads: A1, A2, T1, and T2.

#agentic AI#adaptation#A1 A2 T1 T2

RePo: Language Models with Context Re-Positioning

Intermediate

Huayang Li, Tianyu Zhao et al.Dec 16arXiv

Large language models usually line words up in fixed order slots, which can waste mental energy and make it harder to find the important parts of a long or noisy text.

#context re-positioning#positional encoding#self-attention

1 2 3 4