🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers13

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#long-context reasoning

CL-bench: A Benchmark for Context Learning

Beginner
Shihan Dou, Ming Zhang et al.Feb 3arXiv

CL-bench is a new test that checks whether AI can truly learn new things from the information you give it right now, not just from what it memorized before.

#context learning#benchmark#rubric-based evaluation

VTC-R1: Vision-Text Compression for Efficient Long-Context Reasoning

Intermediate
Yibo Wang, Yongcheng Jing et al.Jan 29arXiv

This paper shows a new way to help AI think through long problems faster by turning earlier text steps into small pictures the AI can reread.

#vision-text compression#optical memory#iterative reasoning

DRPG (Decompose, Retrieve, Plan, Generate): An Agentic Framework for Academic Rebuttal

Intermediate
Peixuan Han, Yingjie Yu et al.Jan 26arXiv

DRPG is a four-step AI helper that writes strong academic rebuttals by first breaking a review into parts, then fetching evidence, planning a strategy, and finally writing the response.

#academic rebuttal#agentic framework#planning with LLMs

Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision Flow

Intermediate
Haocheng Xi, Charlie Ruan et al.Jan 20arXiv

Reinforcement learning (RL) for large language models is slow because the rollout (text generation) stage can take more than 70% of training time, especially for long, step-by-step answers.

#FP8 quantization#on-policy reinforcement learning#precision flow

Lost in the Prompt Order: Revealing the Limitations of Causal Attention in Language Models

Intermediate
Hyunjong Ok, Jaeho LeeJan 20arXiv

Putting the reading passage before the question and answer choices (CQO) makes language models much more accurate than putting it after (QOC), by about 15 percentage points on average.

#causal attention#prompt order sensitivity#multiple-choice question answering

AgentEHR: Advancing Autonomous Clinical Decision-Making via Retrospective Summarization

Intermediate
Yusheng Liao, Chuan Xuan et al.Jan 20arXiv

AgentEHR is a new, realistic test that asks AI agents to read messy hospital records and make full clinical decisions, not just look up facts.

#AgentEHR#RETROSUM#retrospective summarization

MemoryRewardBench: Benchmarking Reward Models for Long-Term Memory Management in Large Language Models

Beginner
Zecheng Tang, Baibei Ji et al.Jan 17arXiv

This paper builds MemoryRewardBench, a big test that checks if reward models (AI judges) can fairly grade how other AIs manage long-term memory, not just whether their final answers are right.

#reward models#long-term memory#long-context reasoning

SIN-Bench: Tracing Native Evidence Chains in Long-Context Multimodal Scientific Interleaved Literature

Intermediate
Yiming Ren, Junjie Wang et al.Jan 15arXiv

The paper introduces SIN-Bench, a new way to test AI that read long scientific papers by forcing them to show exactly where their answers come from.

#multimodal large language models#long-context reasoning#evidence chains

Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling

Intermediate
Chulun Zhou, Chunkang Zhang et al.Dec 30arXiv

Multi-step RAG systems often struggle with long documents because their memory is just a pile of isolated facts, not a connected understanding.

#multi-step RAG#hypergraph memory#hyperedge merging

INTELLECT-3: Technical Report

Intermediate
Prime Intellect Team, Mika Senghaas et al.Dec 18arXiv

INTELLECT-3 is a 106B-parameter Mixture-of-Experts model (about 12B active per token) trained with large-scale reinforcement learning and it beats many bigger models on math, coding, science, and reasoning tests.

#INTELLECT-3#prime-rl#verifiers

Olmo 3

Beginner
Team Olmo, : et al.Dec 15arXiv

Olmo 3 is a family of fully-open AI language models (7B and 32B) where every step—from raw data to training code and checkpoints—is released.

#fully-open language models#model flow#long-context reasoning

QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management

Intermediate
Weizhou Shen, Ziyi Yang et al.Dec 15arXiv

QwenLong-L1.5 is a training recipe that helps AI read and reason over very long documents by improving the data it learns from, the way it is trained, and how it remembers important stuff.

#long-context reasoning#reinforcement learning#GRPO
12