This paper builds Conv-FinRe, a new test that checks if AI financial advisors give advice that fits a person’s true goals, not just what they clicked before.
Everyone uses tests (benchmarks) to judge how smart AI models are, but not all tests are good tests.
This paper studies how sure (confident) large language models are during multi-turn chats where clues arrive step by step.