Papers1061

LMEB: Long-horizon Memory Embedding Benchmark

Xinping Zhao, Xinshuo Hu et al.Mar 13arXiv

LMEB is a new test that checks whether text-embedding models can remember and find information across long stretches of time, not just in short, neat passages.

#LMEB#long-horizon memory retrieval#memory embeddings

Not triaged yet

Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders

Intermediate

Boqiang Zhang, Lei Ke et al.Mar 6arXiv

Penguin-VL shows that small vision-language models (2B and 8B) can be very strong if you give them a better vision encoder, not just a bigger brain.

#Vision Language Model#LLM-based Vision Encoder#Contrastive Learning

Not triaged yet

RoboPocket: Improve Robot Policies Instantly with Your Phone

Intermediate

Junjie Fang, Wendi Chen et al.Mar 5arXiv

RoboPocket turns an ordinary smartphone into a pocket robot coach that helps you fix robot mistakes instantly—without touching a robot.

#RoboPocket#Imitation Learning#Interactive Imitation Learning

Not triaged yet

RealWonder: Real-Time Physical Action-Conditioned Video Generation

Intermediate

Wei Liu, Ziyu Chen et al.Mar 5arXiv

RealWonder is a system that turns a single picture and 3D physical actions (like pushes, wind, and robot gripper moves) into a realistic video in real time.

#action-conditioned video generation#physics simulation#optical flow

Not triaged yet

Progressive Residual Warmup for Language Model Pretraining

Intermediate

Tianhao Chen, Xin Xu et al.Mar 5arXiv

Training big Transformers can wobble at the start because every layer tries to learn all at once.

#Progressive Residual Warmup#ProRes#Transformer training stability

Not triaged yet

UltraDexGrasp: Learning Universal Dexterous Grasping for Bimanual Robots with Synthetic Data

Intermediate

Sizhe Yang, Yiman Xie et al.Mar 5arXiv

Robots need many different ways to grab things, just like people use pinch, tripod, whole-hand, or two hands together.

#bimanual dexterous grasping#universal grasp policy#synthetic data generation

Not triaged yet

BandPO: Bridging Trust Regions and Ratio Clipping via Probability-Aware Bounds for LLM Reinforcement Learning

Intermediate

Yuan Li, Bo Wang et al.Mar 5arXiv

BandPO is a new training method for large language models that keeps updates safe while letting the model freely explore smart, low-probability ideas.

#BandPO#PPO clipping#trust region

Not triaged yet

Locality-Attending Vision Transformer

Intermediate

Sina Hajimiri, Farzad Beizaee et al.Mar 5arXiv

Vision Transformers (ViTs) are great at recognizing what is in a whole image but often blur the tiny details needed to label each pixel (segmentation).

#Vision Transformer#self-attention#segmentation

Not triaged yet

MASQuant: Modality-Aware Smoothing Quantization for Multimodal Large Language Models

Intermediate

Lulu Hu, Wenhu Xiao et al.Mar 5arXiv

Multimodal AI models handle text, images, and audio, but their signals are very different in size, which breaks standard low‑bit compression methods.

#post‑training quantization#multimodal LLM#channel‑wise smoothing

Not triaged yet

Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling

Intermediate

Yong Liu, Xingjian Su et al.Mar 5arXiv

Timer-S1 is a huge time-series model (8.3B parameters, only 0.75B used per step) that predicts the future by thinking step-by-step inside one forward pass.

#time series forecasting#foundation models#Mixture-of-Experts

Not triaged yet

HiMAP-Travel: Hierarchical Multi-Agent Planning for Long-Horizon Constrained Travel

Intermediate

The Viet Bui, Wenjun Li et al.Mar 5arXiv

HiMAP-Travel is a team-based AI planner that splits a long trip into daily chunks so it can follow tough rules like budgets without drifting off course.

#hierarchical planning#multi-agent systems#constraint drift

Not triaged yet

DARE: Aligning LLM Agents with the R Statistical Ecosystem via Distribution-Aware Retrieval

Intermediate

Maojun Sun, Yue Wu et al.Mar 5arXiv

DARE is a new way for AI assistants to find the right R functions by also looking at what the data looks like, not just the words in the question.

#distribution-aware retrieval#RPKB#RCodingAgent

Not triaged yet

1 2 3 4 5