The Sphere Encoder is a new way to make images fast by teaching an autoencoder to place all images evenly on a big imaginary sphere and then decode random spots on that sphere back into pictures.
This paper checks how safe a real, tool-using AI agent called Clawdbot (OpenClaw) is by watching every step it takes during tasks, not just the final answer.
LongCLI-Bench is a new test that checks how well AI coding agents can handle long, realistic software projects in the command line, not just tiny coding puzzles.
Robots learn faster and more flexibly when they can use human touch data, but humans and robots feel touch with very different sensors.
RynnBrain is an open-source 'robot brain' that helps machines see, think, and plan in the real world across space and time.
DeepGen 1.0 is a small 5B-parameter model that can both make new images and smartly edit existing ones from text instructions.
This paper shows a simple, repeatable way to teach general Vision-Language Models (VLMs) to understand e-commerce items much better without forgetting their general skills.
The paper fixes a common problem in AI: models can read pictures and text well, but they often mess up the logic behind them.
ThinkRouter teaches a model to switch how it “thinks” based on how sure it feels, so it stays accurate without talking forever.
Training big language models usually needs super-expensive, tightly connected GPU clusters, which most people do not have.
This paper builds a new test, called MURGAT, to check whether AI models can back up each small fact they say with the right part of a video, audio, or figure.
This paper introduces Causal-JEPA (C-JEPA), a world model that learns by hiding entire objects in its memory and forcing itself to predict them from other objects.