Large language models donβt map out a full step-by-step plan before they start thinking; they mostly plan just a little bit ahead.
This paper introduces EDIR, a new and much more detailed test for Composed Image Retrieval (CIR), where you search for a target image using a starting image plus a short text change.