Papers1262

Decoding as Optimisation on the Probability Simplex: From Top-K to Top-P (Nucleus) to Best-of-K Samplers

Xiaotong Ji, Rasul Tutunov et al.Feb 20arXiv

Decoding (how a language model picks the next word) isn’t a bag of tricks; it’s a clean optimisation problem over probabilities.

#decoding as optimisation#probability simplex#softmax sampling

Not triaged yet

HyTRec: A Hybrid Temporal-Aware Attention Architecture for Long Behavior Sequential Recommendation

Intermediate

Lei Xin, Yuhao Zheng et al.Feb 20arXiv

The paper proposes HyTRec, a recommender system that reads very long histories fast while still paying sharp attention to the latest clicks and purchases.

#Hybrid Attention#Linear Attention#Softmax Attention

Not triaged yet

VLANeXt: Recipes for Building Strong VLA Models

Intermediate

Xiao-Ming Wu, Bin Fan et al.Feb 20arXiv

This paper studies Vision–Language–Action (VLA) robots under one fair setup to find which design choices truly matter.

#Vision-Language-Action#robot manipulation#flow matching

Not triaged yet

EgoPush: Learning End-to-End Egocentric Multi-Object Rearrangement for Mobile Robots

Intermediate

Boyuan An, Zhexiong Wang et al.Feb 20arXiv

EgoPush teaches a small mobile robot to push multiple objects into patterns (like a cross or a line) using only what it sees from its own camera, without any global map.

#egocentric perception#non-prehensile manipulation#object-centric representation

Not triaged yet

VidEoMT: Your ViT is Secretly Also a Video Segmentation Model

Intermediate

Narges Norouzi, Idil Esen Zulfikar et al.Feb 19arXiv

VidEoMT shows that a single, well‑trained Vision Transformer (ViT) can segment and track objects in videos without extra tracking gadgets.

#Video Segmentation#Vision Transformer#Encoder-only

Not triaged yet

MolHIT: Advancing Molecular-Graph Generation with Hierarchical Discrete Diffusion Models

Intermediate

Hojung Jung, Rodrigo Hormazabal et al.Feb 19arXiv

MolHIT is a new AI that builds molecules as graphs, moving from broad chemical groups to exact atoms step by step.

#molecular graph generation#discrete diffusion#hierarchical diffusion

Not triaged yet

AI Gamestore: Scalable, Open-Ended Evaluation of Machine General Intelligence with Human Games

Intermediate

Lance Ying, Ryan Truong et al.Feb 19arXiv

The paper argues that the fairest way to check how generally smart an AI is, is to see how quickly and well it learns lots of different human-made games, just like a person with the same time and practice.

#general intelligence#evaluation benchmark#game-based testing

Not triaged yet

Computer-Using World Model

Intermediate

Yiming Guan, Rui Yu et al.Feb 19arXiv

The paper builds a Computer-Using World Model (CUWM) that lets an AI “imagine” what a desktop app (like Word/Excel/PowerPoint) will look like after a click or keystroke—before doing it for real.

#world model#GUI agent#desktop automation

Not triaged yet

2Mamba2Furious: Linear in Complexity, Competitive in Accuracy

Intermediate

Gabriel Mongaras, Eric C. LarsonFeb 19arXiv

The paper studies Mamba-2 (a fast, linear-time attention method) and pares it down to the pieces that truly boost accuracy.

#linear attention#Mamba-2#2Mamba

Not triaged yet

ArXiv-to-Model: A Practical Study of Scientific LM Training

Intermediate

Anuj GuptaFeb 19arXiv

This paper shows, step by step, how to train a 1.36-billion-parameter science-focused language model directly from raw arXiv LaTeX files using only 2 A100 GPUs.

#scientific language model#arXiv LaTeX#tokenization

Not triaged yet

Unified Latents (UL): How to train your latents

Intermediate

Jonathan Heek, Emiel Hoogeboom et al.Feb 19arXiv

Unified Latents (UL) is a way to learn the hidden code (latents) for images and videos by training three parts together: an encoder, a diffusion prior, and a diffusion decoder.

#Unified Latents#diffusion prior#diffusion decoder

Not triaged yet

FRAPPE: Infusing World Modeling into Generalist Policies via Multiple Future Representation Alignment

Intermediate

Han Zhao, Jingbo Wang et al.Feb 19arXiv

Robots learn better when they predict short, meaningful summaries of future images instead of drawing every pixel of the future scene.

#world modeling#vision-language-action (VLA)#diffusion policy

Not triaged yet

13 14 15 16 17