Multimodal AI models can mix up what they see and what they hear, making things up across senses; this is called cross-modal hallucination.
This paper studies how sure (confident) large language models are during multi-turn chats where clues arrive step by step.