Ships constantly broadcast AIS messages, but these messages are messy, unevenly spaced in time, and sometimes wrong.
FINCH is a new test that checks whether AI can handle real finance and accounting work using messy, real spreadsheets, emails, PDFs, charts, and more.
Robots often see the world as flat pictures but must move in a 3D world, which makes accurate actions hard.
GTR-Turbo teaches a vision-language agent using a 'free teacher' made by merging its own past checkpoints, so no costly external model is needed.
Big text-to-image models make amazing pictures but are slow because they take hundreds of tiny steps to turn noise into an image.
The paper shows that judging vector search only by distance-based recall and speed can be very misleading for real tasks.
QwenLong-L1.5 is a training recipe that helps AI read and reason over very long documents by improving the data it learns from, the way it is trained, and how it remembers important stuff.
Recursive transformers save memory by reusing the same layer over and over, but that makes them less expressive and hurts accuracy.
DrivePI is a single, small (0.5B) multimodal language model that sees with cameras and LiDAR, talks in natural language, and plans driving actions all at once.
Reasoning tokens (the words a model writes before its final answer) help the model think better, but they are not a trustworthy diary of how it really thought.
NL2Repo-Bench is a new benchmark that tests if coding agents can build a whole Python library from just one long natural-language document and an empty folder.
WebOperator is a smart way for AI to use a map of choices (a search tree) to navigate websites safely and reach goals.