Papers2

#DAgger

EgoActor: Grounding Task Planning into Spatial-aware Egocentric Actions for Humanoid Robots via Visual-Language Models

EgoActor is a vision-language model that turns everyday instructions like 'Go to the door and say hi' into step-by-step, egocentric actions a humanoid robot can actually do.

#EgoActing#vision-language model#humanoid robot

Not triaged yet

GTR-Turbo: Merged Checkpoint is Secretly a Free Teacher for Agentic VLM Training

Intermediate

Tong Wei, Yijun Yang et al.Dec 15arXiv

GTR-Turbo teaches a vision-language agent using a 'free teacher' made by merging its own past checkpoints, so no costly external model is needed.

#GTR-Turbo#checkpoint merging#TIES-merging

Not triaged yet