VideoCoF is a new way to edit videos that first figures out WHERE to edit and then does the edit, like thinking before acting.
VG-Refiner is a new way for AI to find the right object in a picture when given a description, even if helper tools make mistakes.
The paper shows how a vision-language model (VLM) can train itself to be a fair judge of answers about images without using any human preference labels.