Papers1055

Agentic-R: Learning to Retrieve for Agentic Search

Agentic-R is a new way to teach a search retriever to find not just similar text, but the text that truly helps an AI get the final answer right.

#agentic search#retriever training#passage utility modeling

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

Intermediate

Mike A. Merrill, Alexander G. Shaw et al.Jan 17arXiv

Terminal-Bench 2.0 is a tough test that checks how well AI agents can solve real, professional tasks by typing commands in a computer terminal.

#Terminal-Bench#command line interface#Docker containers

UniX: Unifying Autoregression and Diffusion for Chest X-Ray Understanding and Generation

Intermediate

Ruiheng Zhang, Jingfeng Yao et al.Jan 16arXiv

UniX is a new medical AI that both understands chest X-rays (writes accurate reports) and generates chest X-ray images (high visual quality) without making the two jobs fight each other.

#UniX#autoregressive branch#diffusion branch

ShapeR: Robust Conditional 3D Shape Generation from Casual Captures

Intermediate

Yawar Siddiqui, Duncan Frost et al.Jan 16arXiv

ShapeR builds clean, correctly sized 3D objects from messy, casual phone or glasses videos by using images, camera poses, sparse SLAM points, and short text captions together.

#ShapeR#3D reconstruction#object-centric

The Poisoned Apple Effect: Strategic Manipulation of Mediated Markets via Technology Expansion of AI Agents

Intermediate

Eilam Shapira, Roi Reichart et al.Jan 16arXiv

The paper shows that simply adding a new AI model to the menu—without anyone actually using it—can push a fairness-focused regulator to change the market rules, shifting money from one side to the other.

#Poisoned Apple effect#AI agents#meta-game

ACoT-VLA: Action Chain-of-Thought for Vision-Language-Action Models

Intermediate

Linqing Zhong, Yi Liu et al.Jan 16arXiv

Robots usually think in words and pictures, but their hands need exact motions, so there is a gap between understanding and doing.

#Vision-Language-Action#Action Chain-of-Thought#Explicit Action Reasoner

Knowledge is Not Enough: Injecting RL Skills for Continual Adaptation

Intermediate

Pingzhi Tang, Yiding Wang et al.Jan 16arXiv

Big language models can learn new facts with simple tutoring (SFT), but that doesn’t automatically teach them how to use those facts well.

#Parametric Skill Transfer#Skill Vector#Task Arithmetic

Language of Thought Shapes Output Diversity in Large Language Models

Intermediate

Shaoyang Xu, Wenxuan ZhangJan 16arXiv

The paper shows that changing the language a model 'thinks in' (its language of thought) can make its English answers more varied without making them much worse in quality.

#language of thought#output diversity#multilingual reasoning

FlashLabs Chroma 1.0: A Real-Time End-to-End Spoken Dialogue Model with Personalized Voice Cloning

Intermediate

Tanyu Chen, Tairan Chen et al.Jan 16arXiv

Chroma 1.0 is a real-time, end-to-end speech-to-speech system that can talk back in your own cloned voice with sub-second delay.

#end-to-end speech-to-speech#personalized voice cloning#streaming TTS

CoDance: An Unbind-Rebind Paradigm for Robust Multi-Subject Animation

Intermediate

Shuai Tan, Biao Gong et al.Jan 16arXiv

CoDance is a new way to animate many characters in one picture using just one pose video, even if the picture and the video do not line up perfectly.

#multi-subject animation#pose-guided video generation#Unbind–Rebind paradigm

PhysRVG: Physics-Aware Unified Reinforcement Learning for Video Generative Models

Intermediate

Qiyuan Zhang, Biao Gong et al.Jan 16arXiv

This paper teaches video-making AIs to follow real-world physics, so rolling balls roll right and collisions look believable.

#physics-aware video generation#rigid body motion#reinforcement learning

ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development

Intermediate

Jie Yang, Honglin Guo et al.Jan 16arXiv

ABC-Bench is a new test that checks if AI coding agents can really do backend work from start to finish, not just write a few lines of code.

#ABC-Bench#agentic backend coding#end-to-end API testing

46 47 48 49 50