Papers3

#conflict resolution

Long-horizon AI assistants can grab old, low-quality, or conflicting memories and then answer with too much confidence, which is dangerous.

Not triaged yet

GISA is a new test (benchmark) that checks how well AI assistants can search the web like real people do.

Not triaged yet

Multi-agent systems are like teams of expert helpers; the tricky part is choosing which helpers to ask for each question.

Not triaged yet