Papers1055

Learning Cross-View Object Correspondence via Cycle-Consistent Mask Prediction

Shannan Yan, Leqi Zheng et al.Feb 22arXiv

This paper teaches a computer to find the same object when seen from two very different cameras, like a body camera (first-person) and a room camera (third-person).

#cross-view correspondence#egocentric to exocentric#binary segmentation

DREAM: Deep Research Evaluation with Agentic Metrics

Intermediate

Elad Ben Avraham, Changhao Li et al.Feb 21arXiv

Deep research agents write long reports, but old tests often judge only how smooth they sound and whether they add links, not whether the facts are true today or the logic really holds.

#deep research agents#agentic evaluation#capability parity

Spilled Energy in Large Language Models

Intermediate

Adrian Robert Minut, Hazem Dewidar et al.Feb 21arXiv

The paper treats the last layer of a Large Language Model (the softmax over tokens) as an Energy-Based Model, which lets us measure a new signal called spilled energy.

#spilled energy#energy-based models#marginal energy

Decoding ML Decision: An Agentic Reasoning Framework for Large-Scale Ranking System

Intermediate

Longfei Yun, Yihan Wu et al.Feb 20arXiv

GEARS is a new way to improve big ranking systems (like what shows up first in your feed) by letting an AI agent explore options safely, instead of humans tweaking knobs by hand.

#GEARS#agentic ranking#Specialized Agent Skills

SARAH: Spatially Aware Real-time Agentic Humans

Intermediate

Evonne Ng, Siwei Zhang et al.Feb 20arXiv

SARAH is a real-time system that makes virtual characters move their whole bodies naturally during a conversation while knowing where the user is.

#spatially aware motion#real-time avatars#causal transformer

Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control

Intermediate

Linxi Xie, Lisong C. Sun et al.Feb 20arXiv

This paper builds a "generated reality" system that lets AI-made videos react to your real head and hand movements in VR.

#generated reality#hand pose conditioning#video diffusion transformer

Decoding as Optimisation on the Probability Simplex: From Top-K to Top-P (Nucleus) to Best-of-K Samplers

Intermediate

Xiaotong Ji, Rasul Tutunov et al.Feb 20arXiv

Decoding (how a language model picks the next word) isn’t a bag of tricks; it’s a clean optimisation problem over probabilities.

#decoding as optimisation#probability simplex#softmax sampling

HyTRec: A Hybrid Temporal-Aware Attention Architecture for Long Behavior Sequential Recommendation

Intermediate

Lei Xin, Yuhao Zheng et al.Feb 20arXiv

The paper proposes HyTRec, a recommender system that reads very long histories fast while still paying sharp attention to the latest clicks and purchases.

#Hybrid Attention#Linear Attention#Softmax Attention

VLANeXt: Recipes for Building Strong VLA Models

Intermediate

Xiao-Ming Wu, Bin Fan et al.Feb 20arXiv

This paper studies Vision–Language–Action (VLA) robots under one fair setup to find which design choices truly matter.

#Vision-Language-Action#robot manipulation#flow matching

EgoPush: Learning End-to-End Egocentric Multi-Object Rearrangement for Mobile Robots

Intermediate

Boyuan An, Zhexiong Wang et al.Feb 20arXiv

EgoPush teaches a small mobile robot to push multiple objects into patterns (like a cross or a line) using only what it sees from its own camera, without any global map.

#egocentric perception#non-prehensile manipulation#object-centric representation

VidEoMT: Your ViT is Secretly Also a Video Segmentation Model

Intermediate

Narges Norouzi, Idil Esen Zulfikar et al.Feb 19arXiv

VidEoMT shows that a single, well‑trained Vision Transformer (ViT) can segment and track objects in videos without extra tracking gadgets.

#Video Segmentation#Vision Transformer#Encoder-only

MolHIT: Advancing Molecular-Graph Generation with Hierarchical Discrete Diffusion Models

Intermediate

Hojung Jung, Rodrigo Hormazabal et al.Feb 19arXiv

MolHIT is a new AI that builds molecules as graphs, moving from broad chemical groups to exact atoms step by step.

#molecular graph generation#discrete diffusion#hierarchical diffusion

9 10 11 12 13