Papers9

#long-context reasoning

VTC-R1: Vision-Text Compression for Efficient Long-Context Reasoning

Yibo Wang, Yongcheng Jing et al.Jan 29arXiv

This paper shows a new way to help AI think through long problems faster by turning earlier text steps into small pictures the AI can reread.

#vision-text compression#optical memory#iterative reasoning

DRPG (Decompose, Retrieve, Plan, Generate): An Agentic Framework for Academic Rebuttal

Intermediate

Peixuan Han, Yingjie Yu et al.Jan 26arXiv

DRPG is a four-step AI helper that writes strong academic rebuttals by first breaking a review into parts, then fetching evidence, planning a strategy, and finally writing the response.

#academic rebuttal#agentic framework#planning with LLMs

Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision Flow

Intermediate

Haocheng Xi, Charlie Ruan et al.Jan 20arXiv

Reinforcement learning (RL) for large language models is slow because the rollout (text generation) stage can take more than 70% of training time, especially for long, step-by-step answers.

#FP8 quantization#on-policy reinforcement learning#precision flow

Lost in the Prompt Order: Revealing the Limitations of Causal Attention in Language Models

Intermediate

Hyunjong Ok, Jaeho LeeJan 20arXiv

Putting the reading passage before the question and answer choices (CQO) makes language models much more accurate than putting it after (QOC), by about 15 percentage points on average.

#causal attention#prompt order sensitivity#multiple-choice question answering

AgentEHR: Advancing Autonomous Clinical Decision-Making via Retrospective Summarization

Intermediate

Yusheng Liao, Chuan Xuan et al.Jan 20arXiv

AgentEHR is a new, realistic test that asks AI agents to read messy hospital records and make full clinical decisions, not just look up facts.

#AgentEHR#RETROSUM#retrospective summarization

SIN-Bench: Tracing Native Evidence Chains in Long-Context Multimodal Scientific Interleaved Literature

Intermediate

Yiming Ren, Junjie Wang et al.Jan 15arXiv

The paper introduces SIN-Bench, a new way to test AI that read long scientific papers by forcing them to show exactly where their answers come from.

#multimodal large language models#long-context reasoning#evidence chains

Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling

Intermediate

Chulun Zhou, Chunkang Zhang et al.Dec 30arXiv

Multi-step RAG systems often struggle with long documents because their memory is just a pile of isolated facts, not a connected understanding.

#multi-step RAG#hypergraph memory#hyperedge merging

INTELLECT-3: Technical Report

Intermediate

Prime Intellect Team, Mika Senghaas et al.Dec 18arXiv

INTELLECT-3 is a 106B-parameter Mixture-of-Experts model (about 12B active per token) trained with large-scale reinforcement learning and it beats many bigger models on math, coding, science, and reasoning tests.

#INTELLECT-3#prime-rl#verifiers

QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management

Intermediate

Weizhou Shen, Ziyi Yang et al.Dec 15arXiv

QwenLong-L1.5 is a training recipe that helps AI read and reason over very long documents by improving the data it learns from, the way it is trained, and how it remembers important stuff.

#long-context reasoning#reinforcement learning#GRPO