Papers1262

Lost in the Prompt Order: Revealing the Limitations of Causal Attention in Language Models

Putting the reading passage before the question and answer choices (CQO) makes language models much more accurate than putting it after (QOC), by about 15 percentage points on average.

#causal attention#prompt order sensitivity#multiple-choice question answering

Not triaged yet

TwinBrainVLA: Unleashing the Potential of Generalist VLMs for Embodied Tasks via Asymmetric Mixture-of-Transformers

Intermediate

Bin Yu, Shijie Lian et al.Jan 20arXiv

TwinBrainVLA is a robot brain with two halves: a frozen generalist that keeps world knowledge safe and a trainable specialist that learns to move precisely.

#Vision-Language-Action#catastrophic forgetting#Asymmetric Mixture-of-Transformers

Not triaged yet

PRiSM: Benchmarking Phone Realization in Speech Models

Beginner

Shikhar Bharadwaj, Chin-Jou Li et al.Jan 20arXiv

PRiSM is a new open-source benchmark that checks how well speech models hear and write down tiny speech sounds called phones.

#phone recognition#phonetic transcription#PFER

Not triaged yet

Numina-Lean-Agent: An Open and General Agentic Reasoning System for Formal Mathematics

Intermediate

Junqi Liu, Zihao Zhou et al.Jan 20arXiv

Numina-Lean-Agent is a new open system that uses a general coding agent to write and check exact math proofs in Lean without special training.

#formal theorem proving#Lean#agentic reasoning

Not triaged yet

Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models

Intermediate

Hengyuan Zhang, Zhihao Zhang et al.Jan 20arXiv

This survey turns model understanding into a step-by-step repair toolkit called Locate, Steer, and Improve.

#mechanistic interpretability#residual stream#attention heads

Not triaged yet

FantasyVLN: Unified Multimodal Chain-of-Thought Reasoning for Vision-Language Navigation

Intermediate

Jing Zuo, Lingzhou Mu et al.Jan 20arXiv

FantasyVLN teaches a robot to follow language instructions while looking around, using a smart, step-by-step thinking style during training but not at test time.

#Vision-and-Language Navigation#Chain-of-Thought#Multimodal CoT

Not triaged yet

AgentEHR: Advancing Autonomous Clinical Decision-Making via Retrospective Summarization

Intermediate

Yusheng Liao, Chuan Xuan et al.Jan 20arXiv

AgentEHR is a new, realistic test that asks AI agents to read messy hospital records and make full clinical decisions, not just look up facts.

#AgentEHR#RETROSUM#retrospective summarization

Not triaged yet

FutureOmni: Evaluating Future Forecasting from Omni-Modal Context for Multimodal LLMs

Intermediate

Qian Chen, Jinlan Fu et al.Jan 20arXiv

FutureOmni is the first benchmark that tests if multimodal AI models can predict what happens next from both sound and video, not just explain what already happened.

#multimodal LLM#audio-visual reasoning#future forecasting

Not triaged yet

DARC: Decoupled Asymmetric Reasoning Curriculum for LLM Evolution

Intermediate

Shengda Fan, Xuyan Ye et al.Jan 20arXiv

DARC teaches big language models to get smarter by splitting training into two calm, well-organized steps instead of one chaotic loop.

#DARC#self-play#curriculum learning

Not triaged yet

ChartVerse: Scaling Chart Reasoning via Reliable Programmatic Synthesis from Scratch

Intermediate

Zheng Liu, Honglin Lin et al.Jan 20arXiv

ChartVerse is a new way to make lots of tricky, realistic charts and perfectly checked questions so AI can learn to read charts better.

#Chart reasoning#Vision-Language Models#Rollout Posterior Entropy

Not triaged yet

Diffusion In Diffusion: Reclaiming Global Coherence in Semi-Autoregressive Diffusion

Intermediate

Linrui Ma, Yufei Cui et al.Jan 20arXiv

The paper proposes Diffusion in Diffusion, a draft-then-revise method that brings back global coherence to fast, block-based diffusion language models.

#discrete diffusion#block diffusion#semi-autoregressive

Not triaged yet

Behavior Knowledge Merge in Reinforced Agentic Models

Intermediate

Xiangchi Yuan, Dachuan Shi et al.Jan 20arXiv

The paper solves a big problem: when you merge several reinforcement-learned models, their special skills get watered down by simple averaging.

#reinforcement learning#model merging#task vectors

Not triaged yet

54 55 56 57 58