WorldVQA is a new test that checks if multimodal AI models can correctly name what they see in pictures without doing extra reasoning.
Long AI tasks can go wrong early and keep getting worse, like a snowball of mistakes called the Spiral of Hallucination.
This paper studies how AI agents that use tools talk about how sure they are and finds a split: some tools make them too sure, others help them be honest.