Papers1262

A BERTology View of LLM Orchestrations: Token- and Layer-Selective Probes for Efficient Single-Pass Classification

Gonzalo Ariel Meyoyan, Luciano Del CorroJan 19arXiv

This paper shows how to add a tiny helper (a probe) to a big language model so it can classify things like safety or sentiment during the same pass it already does to answer you.

#LLM orchestration#single-pass classification#hidden-state probing

Not triaged yet

Aligning Agentic World Models via Knowledgeable Experience Learning

Intermediate

Baochang Ren, Yunzhi Yao et al.Jan 19arXiv

WorldMind teaches AI agents to learn the rules of the real world while they act, instead of cramming everything into fixed model weights.

#agentic world models#predictive coding#physical hallucinations

Not triaged yet

Typhoon ASR Real-time: FastConformer-Transducer for Thai Automatic Speech Recognition

Intermediate

Warit Sirichotedumrong, Adisai Na-Thalang et al.Jan 19arXiv

Big models like Whisper are great for accuracy but too slow for live captions; this paper builds a smaller, faster Thai speech recognizer for real-time use.

#Thai ASR#Streaming speech recognition#FastConformer-Transducer

Not triaged yet

Think3D: Thinking with Space for Spatial Reasoning

Beginner

Zaibin Zhang, Yuhan Wu et al.Jan 19arXiv

Think3D lets AI models stop guessing from flat pictures and start exploring real 3D space, like walking around a room in a video game.

#Think3D#spatial reasoning#3D reconstruction

Not triaged yet

Being-H0.5: Scaling Human-Centric Robot Learning for Cross-Embodiment Generalization

Intermediate

Hao Luo, Ye Wang et al.Jan 19arXiv

Being-H0.5 is a robot brain that learns from huge amounts of human videos and robot demos so it can work on many different robots, not just one.

#Vision-Language-Action model#Unified Action Space#Human-centric learning

Not triaged yet

Agentic Reasoning for Large Language Models

Intermediate

Tianxin Wei, Ting-Wei Li et al.Jan 18arXiv

This paper explains how to turn large language models (LLMs) from quiet students that only answer questions into active agents that can plan, act, and learn over time.

#Agentic Reasoning#LLM Agents#In-Context Learning

Not triaged yet

MMDeepResearch-Bench: A Benchmark for Multimodal Deep Research Agents

Intermediate

Peizhou Huang, Zixuan Zhong et al.Jan 18arXiv

This paper introduces MMDeepResearch-Bench (MMDR-Bench), a new test that checks how well AI “deep research agents” write long, citation-rich reports using both text and images.

#Multimodal Deep Research#Benchmark#Citation Grounding

Not triaged yet

ToolPRMBench: Evaluating and Advancing Process Reward Models for Tool-using Agents

Intermediate

Dawei Li, Yuguang Yao et al.Jan 18arXiv

ToolPRMBench is a new benchmark that checks, step by step, whether an AI agent using tools picks the right next action.

#process reward model#tool-using agents#offline sampling

Not triaged yet

Scientific Image Synthesis: Benchmarking, Methodologies, and Downstream Utility

Beginner

Honglin Lin, Chonghan Qin et al.Jan 17arXiv

The paper studies how to make and judge scientific images that are not just pretty but scientifically correct.

#scientific image synthesis#text-to-image (T2I)#programmatic diagram generation

Not triaged yet

MemoryRewardBench: Benchmarking Reward Models for Long-Term Memory Management in Large Language Models

Beginner

Zecheng Tang, Baibei Ji et al.Jan 17arXiv

This paper builds MemoryRewardBench, a big test that checks if reward models (AI judges) can fairly grade how other AIs manage long-term memory, not just whether their final answers are right.

#reward models#long-term memory#long-context reasoning

Not triaged yet

Agentic-R: Learning to Retrieve for Agentic Search

Intermediate

Wenhan Liu, Xinyu Ma et al.Jan 17arXiv

Agentic-R is a new way to teach a search retriever to find not just similar text, but the text that truly helps an AI get the final answer right.

#agentic search#retriever training#passage utility modeling

Not triaged yet

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

Intermediate

Mike A. Merrill, Alexander G. Shaw et al.Jan 17arXiv

Terminal-Bench 2.0 is a tough test that checks how well AI agents can solve real, professional tasks by typing commands in a computer terminal.

#Terminal-Bench#command line interface#Docker containers

Not triaged yet

55 56 57 58 59