🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers791

AllBeginnerIntermediateAdvanced
All SourcesarXiv

LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR

Intermediate
Said Taghadouini, Adrien Cavaillès et al.Jan 20arXiv

LightOnOCR-2-1B is a single, compact AI model that reads PDF pages and scans and turns them into clean, well-ordered text without using fragile multi-step OCR pipelines.

#LightOnOCR-2-1B#end-to-end OCR#vision-language model

OmniTransfer: All-in-one Framework for Spatio-temporal Video Transfer

Intermediate
Pengze Zhang, Yanze Wu et al.Jan 20arXiv

OmniTransfer is a single system that learns from a whole reference video, not just one image, so it can copy how things look (identity and style) and how they move (motion, camera, effects).

#spatio-temporal video transfer#identity transfer#style transfer

Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment

Intermediate
Yuming Yang, Mingyoung Lai et al.Jan 20arXiv

The paper asks a simple question: Which step-by-step explanations from a teacher model actually help a student model learn to reason better?

#Rank-Surprisal Ratio#data-student suitability#chain-of-thought distillation

Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision Flow

Intermediate
Haocheng Xi, Charlie Ruan et al.Jan 20arXiv

Reinforcement learning (RL) for large language models is slow because the rollout (text generation) stage can take more than 70% of training time, especially for long, step-by-step answers.

#FP8 quantization#on-policy reinforcement learning#precision flow

KAGE-Bench: Fast Known-Axis Visual Generalization Evaluation for Reinforcement Learning

Intermediate
Egor Cherepanov, Daniil Zelezetsky et al.Jan 20arXiv

KAGE-Bench is a fast, carefully controlled benchmark that tests how well reinforcement learning (RL) agents trained on pixels handle specific visual changes, like new backgrounds or lighting, without changing the actual game rules.

#reinforcement learning#visual generalization#KAGE-Env

InT: Self-Proposed Interventions Enable Credit Assignment in LLM Reasoning

Intermediate
Matthew Y. R. Yang, Hao Bai et al.Jan 20arXiv

The paper introduces Intervention Training (InT), a simple way for a language model to find and fix the first wrong step in its own reasoning using a short, targeted correction.

#Intervention Training#credit assignment#LLM reasoning

Toward Efficient Agents: Memory, Tool learning, and Planning

Intermediate
Xiaofang Yang, Lijun Li et al.Jan 20arXiv

This survey explains how to make AI agents not just smart, but also efficient with their time, memory, and tool use.

#agent efficiency#memory compression#tool learning

Paper2Rebuttal: A Multi-Agent Framework for Transparent Author Response Assistance

Intermediate
Qianli Ma, Chang Guo et al.Jan 20arXiv

This paper turns rebuttal writing from ‘just write some text’ into ‘make a plan with proof, then write.’

#rebuttal generation#multi-agent systems#evidence-centric planning

RoboBrain 2.5: Depth in Sight, Time in Mind

Intermediate
Huajie Tan, Enshen Zhou et al.Jan 20arXiv

RoboBrain 2.5 teaches robots to see depth precisely and to keep track of time-aware progress, so plans turn into safe, accurate actions.

#Embodied AI#3D spatial reasoning#metric grounding

Lost in the Prompt Order: Revealing the Limitations of Causal Attention in Language Models

Intermediate
Hyunjong Ok, Jaeho LeeJan 20arXiv

Putting the reading passage before the question and answer choices (CQO) makes language models much more accurate than putting it after (QOC), by about 15 percentage points on average.

#causal attention#prompt order sensitivity#multiple-choice question answering

TwinBrainVLA: Unleashing the Potential of Generalist VLMs for Embodied Tasks via Asymmetric Mixture-of-Transformers

Intermediate
Bin Yu, Shijie Lian et al.Jan 20arXiv

TwinBrainVLA is a robot brain with two halves: a frozen generalist that keeps world knowledge safe and a trainable specialist that learns to move precisely.

#Vision-Language-Action#catastrophic forgetting#Asymmetric Mixture-of-Transformers

Numina-Lean-Agent: An Open and General Agentic Reasoning System for Formal Mathematics

Intermediate
Junqi Liu, Zihao Zhou et al.Jan 20arXiv

Numina-Lean-Agent is a new open system that uses a general coding agent to write and check exact math proofs in Lean without special training.

#formal theorem proving#Lean#agentic reasoning
2223242526