People often pick CLIP-like models for image labeling, but this paper shows that large multimodal models (LMMs) can be just as good—or even better—when you give them a few examples in the prompt (in-context learning).
ManCAR helps recommendation systems think step by step but keeps their thoughts on realistic paths using a map of how items connect.
LOCA-bench is a test that challenges AI agents to work correctly as their to-do list and background information grow very, very long.
The paper shows that when we give AI lots of extra text, even harmless extra text, it can get badly confused—sometimes losing up to 80% of its accuracy.