Reasoning models often talk too much, and those extra words can actually make them more wrong.
MobilityBench is a big, carefully built test that checks how well AI helpers can plan real-world routes using natural language and map tools.
When you tune the learning rate carefully, plain old LoRA fine-tuning works about as well as fancy new versions.
Giving large language models a few good examples and step-by-step instructions can make them much better at spotting feelings in text.
This paper builds MFMD-Scen, a big test to see how AI changes its truth/false judgments about the same money-related claim when the situation around it changes.