EgoActor is a vision-language model that turns everyday instructions like 'Go to the door and say hi' into step-by-step, egocentric actions a humanoid robot can actually do.
GTR-Turbo teaches a vision-language agent using a 'free teacher' made by merging its own past checkpoints, so no costly external model is needed.