Papers1262

A2Eval: Agentic and Automated Evaluation for Embodied Brain

A2Eval is a two-agent system that automatically builds and runs fair tests for robot-style vision-language models, cutting wasted work while keeping results trustworthy.

#Embodied AI#Vision-Language Models#Agentic Evaluation

Not triaged yet

Research on World Models Is Not Merely Injecting World Knowledge into Specific Tasks

Intermediate

Bohan Zeng, Kaixin Zhu et al.Feb 2arXiv

This paper argues that true world models are not just sprinkling facts into single tasks, but building a unified system that can see, think, remember, act, and generate across many situations.

#world models#unified framework#multimodal reasoning

Not triaged yet

PISCES: Annotation-free Text-to-Video Post-Training via Optimal Transport-Aligned Rewards

Intermediate

Minh-Quan Le, Gaurav Mittal et al.Feb 2arXiv

This paper shows how to make text-to-video models create clearer, steadier, and more on-topic videos without using any human-labeled ratings.

#text-to-video#optimal transport#annotation-free

Not triaged yet

Wiki Live Challenge: Challenging Deep Research Agents with Expert-Level Wikipedia Articles

Beginner

Shaohan Wang, Benfeng Xu et al.Feb 2arXiv

This paper builds a live challenge that tests how well Deep Research Agents (DRAs) can write expert-level Wikipedia-style articles.

#Deep Research Agents#Wikipedia Good Articles#Benchmark

Not triaged yet

Generative Visual Code Mobile World Models

Intermediate

Woosung Koh, Sungjun Han et al.Feb 2arXiv

This paper shows a new way to predict what a phone screen will look like after you tap or scroll: generate web code (like HTML/CSS/SVG) and then render it to pixels.

#mobile GUI#world model#vision-language model

Not triaged yet

FS-Researcher: Test-Time Scaling for Long-Horizon Research Tasks with File-System-Based Agents

Intermediate

Chiwei Zhu, Benfeng Xu et al.Feb 2arXiv

FS-Researcher is a two-agent system that lets AI do very long research by saving everything in a computer folder so it never runs out of memory.

#FS-Researcher#file-system agents#external memory

Not triaged yet

Toward Cognitive Supersensing in Multimodal Large Language Model

Intermediate

Boyi Li, Yifan Shen et al.Feb 2arXiv

This paper teaches multimodal AI models to not just read pictures but to also imagine and think with pictures inside their heads.

#multimodal large language model#visual cognition#latent visual imagery

Not triaged yet

Making Avatars Interact: Towards Text-Driven Human-Object Interaction for Controllable Talking Avatars

Beginner

Youliang Zhang, Zhengguang Zhou et al.Feb 2arXiv

This paper teaches talking avatars not just to speak, but to look around their scene and handle nearby objects exactly as a text instruction says.

#grounded human-object interaction#talking avatars#diffusion transformer

Not triaged yet

Alternating Reinforcement Learning for Rubric-Based Reward Modeling in Non-Verifiable LLM Post-Training

Intermediate

Ran Xu, Tianci Liu et al.Feb 2arXiv

The paper introduces Rubric-ARM, a system that teaches two AI helpers—a rubric maker and a judge—to learn together using reinforcement learning so they can better decide which answers people would prefer.

#Rubric-based reward modeling#LLM-as-a-judge#Alternating reinforcement learning

Not triaged yet

Ebisu: Benchmarking Large Language Models in Japanese Finance

Intermediate

Xueqing Peng, Ruoyu Xiang et al.Feb 1arXiv

EBISU is a new test that checks how well AI models understand Japanese finance, a language and domain where hints and special terms are common.

#EBISU#Japanese finance NLP#implicit commitment recognition

Not triaged yet

Rethinking Selective Knowledge Distillation

Intermediate

Almog Tavor, Itay Ebenspanger et al.Feb 1arXiv

The paper studies how to teach a smaller language model using a bigger one by only focusing on the most useful bits instead of everything.

#knowledge distillation#selective distillation#student entropy

Not triaged yet

PromptRL: Prompt Matters in RL for Flow-Based Image Generation

Intermediate

Fu-Yun Wang, Han Zhang et al.Feb 1arXiv

PromptRL teaches a language model to rewrite prompts while a flow-based image model learns to draw, and both are trained together using the same rewards.

#PromptRL#flow matching#reinforcement learning

Not triaged yet

37 38 39 40 41