MemGovern teaches code agents to learn from past human fixes on GitHub by turning messy discussions into clean, reusable 'experience cards.'
LLMs can look confident but still change their answers when the surrounding text nudges them, showing that confidence alone isn’t real truthfulness.
Long-term AI helpers remember past chats, but using all memories can trap them in old ideas (Memory Anchoring).
KnowMe-Bench is a new test that checks if AI helpers truly understand a person, not just remember facts.
Real people often ask vague questions with pictures, and today’s vision-language models (VLMs) struggle with them.
COMPASS is a new framework that turns a company’s rules into thousands of smart test questions to check if chatbots follow those rules.
The paper teaches small language models to predict open-ended future events by turning daily news into thousands of safe, graded practice questions.
This survey links how human brains remember things to how AI agents should remember things so they can act smarter over time.
Large language models can say things that sound right but aren’t supported by the given document; this is called a faithfulness hallucination.
Capitalization tie-out checks if a company’s ownership table truly matches what its legal documents say.
This paper organizes how AI agents learn and improve into one simple map with four roads: A1, A2, T1, and T2.
Large language models usually line words up in fixed order slots, which can waste mental energy and make it harder to find the important parts of a long or noisy text.