Papers1262

A Trajectory-Based Safety Audit of Clawdbot (OpenClaw)

Tianyu Chen, Dongrui Liu et al.Feb 16arXiv

This paper checks how safe a real, tool-using AI agent called Clawdbot (OpenClaw) is by watching every step it takes during tasks, not just the final answer.

#trajectory-centric safety#tool-using AI agents#prompt injection

Not triaged yet

LongCLI-Bench: A Preliminary Benchmark and Study for Long-horizon Agentic Programming in Command-Line Interfaces

Beginner

Yukang Feng, Jianwen Sun et al.Feb 15arXiv

LongCLI-Bench is a new test that checks how well AI coding agents can handle long, realistic software projects in the command line, not just tiny coding puzzles.

#LongCLI-Bench#agentic programming#command-line interface agents

Not triaged yet

Does Socialization Emerge in AI Agent Society? A Case Study of Moltbook

Intermediate

Ming Li, Xirui Li et al.Feb 15arXiv

This paper studies Moltbook, a giant social network made only of AI agents, to see if they start acting like a real society over time.

#AI socialization#multi-agent systems#Moltbook

Not triaged yet

AutoWebWorld: Synthesizing Infinite Verifiable Web Environments via Finite State Machines

Intermediate

Yifan Wu, Yiran Peng et al.Feb 15arXiv

AutoWebWorld builds pretend websites with clear rules so AI can practice safely and be checked automatically.

#Finite State Machine#Web GUI Agents#Synthetic Data Generation

Not triaged yet

Sanity Checks for Sparse Autoencoders: Do SAEs Beat Random Baselines?

Intermediate

Anton Korznikov, Andrey Galichin et al.Feb 15arXiv

Sparse autoencoders (SAEs) are popular for explaining what large language models are doing, but this paper shows they often don’t learn real, meaningful features.

#sparse autoencoders#interpretability#dictionary learning

Not triaged yet

Empty Shelves or Lost Keys? Recall Is the Bottleneck for Parametric Factuality

Intermediate

Nitay Calderon, Eyal Ben-David et al.Feb 15arXiv

Not all wrong answers from large language models (LLMs) mean they never learned the fact—many times the model knows it but can’t pull it out on demand.

#LLM factuality#encoding vs recall#knowledge profiling

Not triaged yet

Mobile-Agent-v3.5: Multi-platform Fundamental GUI Agents

Intermediate

Haiyang Xu, Xi Zhang et al.Feb 15arXiv

This paper builds GUI-Owl-1.5, an AI that can use phones, computers, and web browsers like a careful human helper.

#GUI agent#visual grounding#reinforcement learning

Not triaged yet

Experiential Reinforcement Learning

Intermediate

Taiwei Shi, Sihao Chen et al.Feb 15arXiv

This paper teaches AI models to learn like good students: try, think about what went wrong, fix it, and remember the fix.

#Experiential Reinforcement Learning#self-reflection#distillation

Not triaged yet

TactAlign: Human-to-Robot Policy Transfer via Tactile Alignment

Beginner

Youngsun Wi, Jessica Yin et al.Feb 14arXiv

Robots learn faster and more flexibly when they can use human touch data, but humans and robots feel touch with very different sensors.

#tactile alignment#human-to-robot transfer#rectified flow

Not triaged yet

SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuning

Intermediate

Jintao Zhang, Kai Jiang et al.Feb 13arXiv

Video generators are slow because attention looks at everything, which takes a lot of time.

#sparse attention#Top-k masking#Top-p masking

Not triaged yet

RynnBrain: Open Embodied Foundation Models

Beginner

Ronghao Dang, Jiayan Guo et al.Feb 13arXiv

RynnBrain is an open-source 'robot brain' that helps machines see, think, and plan in the real world across space and time.

#embodied intelligence#egocentric vision#spatiotemporal localization

Not triaged yet

SLA2: Sparse-Linear Attention with Learnable Routing and QAT

Intermediate

Jintao Zhang, Haoxu Wang et al.Feb 13arXiv

SLA2 is a new way for AI to pay attention faster by smartly splitting work between two helpers: a precise one (sparse attention) and a speedy one (linear attention).

#Sparse Attention#Linear Attention#SLA2

Not triaged yet

17 18 19 20 21