Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits
IntermediateAmirhosein Ghasemabadi, Di NiuDec 23arXiv
Large language models often sound confident even when they are wrong, and existing ways to catch mistakes are slow or not very accurate.
#self-awareness#large language models#hidden states