Papers5

#zero-shot transfer

Large Multimodal Models as General In-Context Classifiers

Marco Garosi, Matteo Farina et al.Feb 26arXiv

People often pick CLIP-like models for image labeling, but this paper shows that large multimodal models (LMMs) can be just as good—or even better—when you give them a few examples in the prompt (in-context learning).

#in-context learning#multimodal models#open-world classification

Not triaged yet

SimToolReal: An Object-Centric Policy for Zero-Shot Dexterous Tool Manipulation

Intermediate

Kushal Kedia, Tyler Ga Wei Lum et al.Feb 18arXiv

SimToolReal teaches a robot hand to use many different tools by practicing in simulation and then working in the real world without extra training.

#dexterous manipulation#sim-to-real reinforcement learning#goal-conditioned policy

Not triaged yet

VLingNav: Embodied Navigation with Adaptive Reasoning and Visual-Assisted Linguistic Memory

Intermediate

Shaoan Wang, Yuanfei Luo et al.Jan 13arXiv

VLingNav is a robot navigation system that sees, reads instructions, and acts, while deciding when to think hard and when to just move.

#Vision-Language-Action#embodied navigation#adaptive chain-of-thought

Not triaged yet

NitroGen: An Open Foundation Model for Generalist Gaming Agents

Intermediate

Loïc Magne, Anas Awadalla et al.Jan 4arXiv

NitroGen is a vision-to-action AI that learns to play many video games by watching 40,000 hours of gameplay videos from over 1,000 titles with on-screen controller overlays.

#NitroGen#generalist gaming agent#behavior cloning

Not triaged yet

FiNERweb: Datasets and Artifacts for Scalable Multilingual Named Entity Recognition

Intermediate

Jonas Golde, Patrick Haller et al.Dec 15arXiv

FINERWEB is a new, carefully built dataset pipeline that teaches computers to spot names of people, places, and more across 91 languages and 25 writing systems.

#multilingual NER#named entity recognition#LLM supervision

Not triaged yet