This paper teaches a computer to find buttons, text, and icons on screens so it can click and type in the right places, a skill called GUI grounding.
LatentChem lets AI do chemistry thinking quietly inside continuous vectors instead of writing long step-by-step sentences.
SwimBird is a multimodal AI that can switch how it thinks: only in text, only in vision (with hidden picture-like thoughts), or a mix of both.
DFlash is a new way to make big language models answer much faster without changing the final answers.
InterPrior is a new brain for simulated humans and humanoid robots that can move, balance, and use objects by following simple goals instead of step-by-step instructions.
V-Retrver is a new way for AI to search across text and images by double-checking tiny visual details instead of only guessing from words.
The paper fixes a big problem in long video generation: models either forget what happened or slowly drift off-topic over time.
BudgetMem is a way for AI helpers to build and use memory on the fly, picking how much thinking to spend so answers are both good and affordable.
RISE-Video is a new test that checks whether video-making AIs follow hidden world rules, not just make pretty pictures.
SAGE is a new test for how well AI research agents find scientific papers when questions require multi-step reasoning.
TRIT is a new training method that teaches AI to translate and think at the same time so it can solve hard problems in many languages without extra helper models.
The paper studies a simple way to train giant language models with reinforcement learning by replacing a hard-to-compute term (the log-partition function) with something easy: the mean reward.