This survey links how human brains remember things to how AI agents should remember things so they can act smarter over time.
The paper teaches vision-language models (AIs that look and read) to pay attention to the right picture parts without needing extra tools during answering time.
Coding agents used to fix software rely on feedback; unit tests give only pass/fail signals that are often noisy or missing.
LongVideoAgent is a team of three AIs that work together to answer questions about hour‑long TV episodes without missing small details.
Large language models can say things that sound right but aren’t supported by the given document; this is called a faithfulness hallucination.
Memory-T1 teaches chatty AI agents to keep track of when things happened across many conversations.
Autoregressive (AR) image models make pictures by choosing tokens one-by-one, but they were judged only on picking likely tokens, not on how good the final picture looks in pixels.
This paper builds a tough new test called O3-BENCH to check if AI can truly think with images, not just spot objects.
Reasoning Palette gives a language or vision-language model a tiny hidden “mood” (a latent code) before it starts answering, so it chooses a smarter plan rather than just rolling dice on each next word.
AuditDM is a friendly 'auditor' model that hunts for where vision-language models get things wrong and then creates the right practice to fix them.
AdaTooler-V teaches an image-and-video AI to first ask, “Do I really need a tool?” before using one, which saves time and boosts accuracy.
RePlan is a plan-then-execute system that first figures out exactly where to edit in a picture and then makes clean changes there.