EgoPush teaches a small mobile robot to push multiple objects into patterns (like a cross or a line) using only what it sees from its own camera, without any global map.
The paper solves a big problem in fast image generators: they got quick, but they lost variety and kept making similar pictures.