This paper builds SocialVeil, a testing world where AI chat agents must talk to each other even when communication is messy, not perfect.
This paper builds an AI team that can make real full‑stack websites (frontend, backend, and database) from plain English instructions.
LatentMem is a new memory system that helps teams of AI agents remember the right things for their specific jobs without overloading them with text.
This paper builds a smart team of AI helpers, called MEnvAgent, that automatically sets up the right computer environments for code projects in many languages.
This paper builds an AI agent that learns new skills while working, like a kid who learns new tricks during recess without a teacher telling them what to do.
This paper turns rebuttal writing from ‘just write some text’ into ‘make a plan with proof, then write.’
This paper introduces MATTRL, a way for multiple AI agents to learn from their own conversations at test time using short, reusable text notes instead of retraining their weights.
Multi-agent systems are like teams of expert helpers; the tricky part is choosing which helpers to ask for each question.
A digital twin is a living computer copy of a real thing (like a bridge, a heart, or a factory) that stays in sync with sensors and helps us predict, fix, and improve the real thing.
This paper builds a tough new test called O3-BENCH to check if AI can truly think with images, not just spot objects.
SWE-EVO is a new test (benchmark) that checks if AI coding agents can upgrade real software projects over many steps, not just fix one small bug.
Multi-agent AI teams are not automatically better; their success depends on matching the team’s coordination style to the job’s structure.