Papers1262

LEO-RobotAgent: A General-purpose Robotic Agent for Language-driven Embodied Operator

Lihuang Chen, Xiangyu Luo et al.Dec 11arXiv

LEO-RobotAgent is a simple but powerful framework that lets a language model think, plan, and operate many kinds of robots using natural language.

#LEO-RobotAgent#language-driven robotics#LLM agent

Not triaged yet

Achieving Olympia-Level Geometry Large Language Model Agent via Complexity Boosting Reinforcement Learning

Intermediate

Haiteng Zhao, Junhao Shen et al.Dec 11arXiv

This paper builds InternGeometry, a large language model agent that solves Olympiad-level geometry by talking to a math engine, remembering what worked, and trying smart new ideas.

#InternGeometry#geometry theorem proving#auxiliary constructions

Not triaged yet

T-pro 2.0: An Efficient Russian Hybrid-Reasoning Model and Playground

Intermediate

Dmitrii Stoianov, Danil Taranets et al.Dec 11arXiv

T-pro 2.0 is an open Russian language model that can answer quickly or think step by step, so you can pick speed or accuracy when you need it.

#T-pro 2.0#Russian LLM#Hybrid reasoning

Not triaged yet

SWAA: Sliding Window Attention Adaptation for Efficient Long-Context LLMs Without Pretraining

Intermediate

Yijiong Yu, Jiale Liu et al.Dec 11arXiv

Long texts make standard attention in large language models very slow because it checks every word against every other word.

#Sliding Window Attention#SWAA#FA Decode

Not triaged yet

Confucius Code Agent: Scalable Agent Scaffolding for Real-World Codebases

Beginner

Sherman Wong, Zhenting Qi et al.Dec 11arXiv

This paper introduces the Confucius Code Agent (CCA), a coding helper built to handle huge real-world codebases with long tasks and many tools.

#coding agents#agent scaffolding#context management

Not triaged yet

MotionEdit: Benchmarking and Learning Motion-Centric Image Editing

Beginner

Yixin Wan, Lei Ke et al.Dec 11arXiv

This paper creates MotionEdit, a high-quality dataset that teaches AI to change how people and objects move in a picture without breaking their looks or the scene.

#motion-centric image editing#optical flow#MotionEdit dataset

Not triaged yet

Openpi Comet: Competition Solution For 2025 BEHAVIOR Challenge

Intermediate

Junjie Bai, Yu-Wei Chao et al.Dec 10arXiv

This paper shows how to make home-helper robots better at long, multi-step chores by smart training on diverse tasks and by polishing the model after training using its own best attempts.

#Vision-Language-Action#long-horizon manipulation#rejection sampling fine-tuning

Not triaged yet

HiF-VLA: Hindsight, Insight and Foresight through Motion Representation for Vision-Language-Action Models

Intermediate

Minghui Lin, Pengxiang Ding et al.Dec 10arXiv

Robots often act like goldfish with short memories; HiF-VLA fixes this by letting them use motion to remember the past and predict the future.

#Vision-Language-Action#motion vectors#temporal reasoning

Not triaged yet

UniUGP: Unifying Understanding, Generation, and Planing For End-to-end Autonomous Driving

Intermediate

Hao Lu, Ziyang Liu et al.Dec 10arXiv

UniUGP is a single system that learns to understand road scenes, explain its thinking, plan safe paths, and even imagine future video frames.

#UniUGP#vision-language-action#world model

Not triaged yet

Composing Concepts from Images and Videos via Concept-prompt Binding

Intermediate

Xianghao Kong, Zeyu Zhang et al.Dec 10arXiv

This paper introduces BiCo, a one-shot way to mix ideas from images and videos by tightly tying each visual idea to the exact words in a prompt.

#BiCo#concept binding#token-level composition

Not triaged yet

MOA: Multi-Objective Alignment for Role-Playing Agents

Intermediate

Chonghua Liao, Ke Wang et al.Dec 10arXiv

Role-playing agents need to juggle several goals at once, like staying in character, following instructions, and using the right tone.

#multi-objective alignment#role-playing agents#reinforcement learning

Not triaged yet

MentraSuite: Post-Training Large Language Models for Mental Health Reasoning and Assessment

Intermediate

Mengxi Xiao, Kailai Yang et al.Dec 10arXiv

MentraSuite is a complete toolkit that teaches large language models (LLMs) to reason about mental health step by step, not just sound caring.

#mental health reasoning#LLM post-training#supervised fine-tuning

Not triaged yet

97 98 99 100 101