Papers1262

PyVision-RL: Forging Open Agentic Vision Models via RL

Shitian Zhao, Shaoheng Lin et al.Feb 24arXiv

PyVision-RL teaches vision-language models to act like curious agents that think in multiple steps and use Python tools to inspect images and videos.

#agentic multimodal models#reinforcement learning#dynamic tooling

Not triaged yet

From Scale to Speed: Adaptive Test-Time Scaling for Image Editing

Intermediate

Xiangyan Qu, Zhenlong Yuan et al.Feb 24arXiv

This paper speeds up and improves AI image editing by giving hard edits more attention and easy edits less, just like a smart coach.

#adaptive test-time scaling#image chain-of-thought#image editing

Not triaged yet

BBQ-to-Image: Numeric Bounding Box and Qolor Control in Large-Scale Text-to-Image Models

Beginner

Eliran Kachlon, Alexander Visheratin et al.Feb 24arXiv

BBQ is a text-to-image model that lets you place objects exactly where you want using numeric bounding boxes and color them with exact RGB values.

#text-to-image#bounding boxes#RGB control

Not triaged yet

QEDBENCH: Quantifying the Alignment Gap in Automated Evaluation of University-Level Mathematical Proofs

Intermediate

Santiago Gonzalez, Alireza Amiri Bavandpour et al.Feb 24arXiv

This paper shows that when AI models grade university-level math proofs, they often disagree with human experts in systematic ways.

#LLM-as-a-Judge#mathematical proof evaluation#alignment gap

Not triaged yet

QuantVLA: Scale-Calibrated Post-Training Quantization for Vision-Language-Action Models

Intermediate

Jingxuan Zhang, Yunta Hsieh et al.Feb 23arXiv

Vision-Language-Action (VLA) robots are powerful but too big and slow for many real-world devices.

#Vision-Language-Action#Post-Training Quantization#Diffusion Transformer

Not triaged yet

Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device

Intermediate

Abdelrahman Shaker, Ahmed Heakl et al.Feb 23arXiv

Mobile-O is a small but smart AI that can both understand pictures and make new images, and it runs right on your phone.

#Mobile-O#unified multimodal model#on-device AI

Not triaged yet

A Very Big Video Reasoning Suite

Intermediate

Maijunxian Wang, Ruisi Wang et al.Feb 23arXiv

This paper builds a gigantic library of video puzzles (VBVR) so AI can practice not just making pretty videos, but actually thinking through what happens over time.

#video reasoning#rule-based evaluation#in-domain generalization

Not triaged yet

NanoKnow: How to Know What Your Language Model Knows

Beginner

Lingwei Gu, Nour Jedidi et al.Feb 23arXiv

NanoKnow is a new benchmark that checks whether a language model’s answers come from what it saw during training or from extra text we give it at question time.

#NanoKnow#FineWeb-Edu#nanochat

Not triaged yet

ManCAR: Manifold-Constrained Latent Reasoning with Adaptive Test-Time Computation for Sequential Recommendation

Intermediate

Kun Yang, Yuxuan Zhu et al.Feb 23arXiv

ManCAR helps recommendation systems think step by step but keeps their thoughts on realistic paths using a map of how items connect.

#sequential recommendation#latent reasoning#interaction graph

Not triaged yet

Agents of Chaos

Beginner

Natalie Shapira, Chris Wendler et al.Feb 23arXiv

This paper put real AI agents into a safe, live playground and asked expert testers to mess with them to see what breaks.

#AI agents#red teaming#identity verification

Not triaged yet

DSDR: Dual-Scale Diversity Regularization for Exploration in LLM Reasoning

Intermediate

Zhongwei Wan, Yun Shen et al.Feb 23arXiv

LLMs trained with simple rewards often latch onto just a few ways of solving problems and stop exploring, which hurts their ability to find other correct answers.

#DSDR#dual-scale diversity#RLVR

Not triaged yet

SkillOrchestra: Learning to Route Agents via Skill Transfer

Beginner

Jiayu Wang, Yifei Ming et al.Feb 23arXiv

SkillOrchestra is a new way to make teams of AI models and tools work together by thinking in terms of skills, not just picking one big model for everything.

#agent orchestration#model routing#skill discovery

Not triaged yet

11 12 13 14 15