Papers200

Truncated Step-Level Sampling with Process Rewards for Retrieval-Augmented Reasoning

Chris Samarinas, Haw-Shiuan Chang et al.Feb 26arXiv

SLATE is a new way to teach AI to think step by step while using a search engine, giving feedback at each step instead of only at the end.

#retrieval-augmented reasoning#reinforcement learning#GRPO

Not triaged yet

MediX-R1: Open Ended Medical Reinforcement Learning

Beginner

Sahal Shaji Mullappilly, Mohammed Irfan Kurpath et al.Feb 26arXiv

MediX-R1 teaches medical AI models to give clear, free-form answers (not just A, B, C, or D) and to explain their thinking.

#medical multimodal RL#open-ended reinforcement learning#composite reward

Not triaged yet

Imagination Helps Visual Reasoning, But Not Yet in Latent Space

Beginner

You Li, Chi Chen et al.Feb 26arXiv

The paper asks a simple question: do the model’s invisible “imagination tokens” actually help it reason about images?

#multimodal large language model#visual reasoning#latent visual reasoning

Not triaged yet

MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios

Beginner

Zhiheng Song, Jingshuai Zhang et al.Feb 26arXiv

MobilityBench is a big, carefully built test that checks how well AI helpers can plan real-world routes using natural language and map tools.

#MobilityBench#route-planning agents#large language models

Not triaged yet

Tool-R0: Self-Evolving LLM Agents for Tool-Learning from Zero Data

Beginner

Emre Can Acikgoz, Cheng Qian et al.Feb 24arXiv

Tool-R0 teaches a language model to use software tools (like APIs) with zero human-made training data.

#self-play reinforcement learning#tool calling#function calling

Not triaged yet

Multi-Vector Index Compression in Any Modality

Beginner

Hanxiang Qin, Alexander Martin et al.Feb 24arXiv

Searching through videos, images, and long documents is powerful but gets very expensive when every tiny piece is stored separately.

#multi-vector retrieval#late interaction#index compression

Not triaged yet

From Perception to Action: An Interactive Benchmark for Vision Reasoning

Beginner

Yuhao Wu, Maojia Song et al.Feb 24arXiv

The paper introduces CHAIN, a hands-on 3D playground that tests if AI can not only see objects but also plan and act under real physics.

#interactive benchmark#vision-language models#physical reasoning

Not triaged yet

BBQ-to-Image: Numeric Bounding Box and Qolor Control in Large-Scale Text-to-Image Models

Beginner

Eliran Kachlon, Alexander Visheratin et al.Feb 24arXiv

BBQ is a text-to-image model that lets you place objects exactly where you want using numeric bounding boxes and color them with exact RGB values.

#text-to-image#bounding boxes#RGB control

Not triaged yet

NanoKnow: How to Know What Your Language Model Knows

Beginner

Lingwei Gu, Nour Jedidi et al.Feb 23arXiv

NanoKnow is a new benchmark that checks whether a language model’s answers come from what it saw during training or from extra text we give it at question time.

#NanoKnow#FineWeb-Edu#nanochat

Not triaged yet

Agents of Chaos

Beginner

Natalie Shapira, Chris Wendler et al.Feb 23arXiv

This paper put real AI agents into a safe, live playground and asked expert testers to mess with them to see what breaks.

#AI agents#red teaming#identity verification

Not triaged yet

SkillOrchestra: Learning to Route Agents via Skill Transfer

Beginner

Jiayu Wang, Yifei Ming et al.Feb 23arXiv

SkillOrchestra is a new way to make teams of AI models and tools work together by thinking in terms of skills, not just picking one big model for everything.

#agent orchestration#model routing#skill discovery

Not triaged yet

RoboCurate: Harnessing Diversity with Action-Verified Neural Trajectory for Robot Learning

Beginner

Seungku Kim, Suhyeok Jang et al.Feb 21arXiv

RoboCurate is a way to make better robot training videos by checking if the actions in a generated video actually match what a robot would do in a simulator.

#RoboCurate#neural trajectory#action verification

Not triaged yet

1 2 3 4 5