🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers791

AllBeginnerIntermediateAdvanced
All SourcesarXiv

Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models

Intermediate
Hengyuan Zhang, Zhihao Zhang et al.Jan 20arXiv

This survey turns model understanding into a step-by-step repair toolkit called Locate, Steer, and Improve.

#mechanistic interpretability#residual stream#attention heads

FantasyVLN: Unified Multimodal Chain-of-Thought Reasoning for Vision-Language Navigation

Intermediate
Jing Zuo, Lingzhou Mu et al.Jan 20arXiv

FantasyVLN teaches a robot to follow language instructions while looking around, using a smart, step-by-step thinking style during training but not at test time.

#Vision-and-Language Navigation#Chain-of-Thought#Multimodal CoT

AgentEHR: Advancing Autonomous Clinical Decision-Making via Retrospective Summarization

Intermediate
Yusheng Liao, Chuan Xuan et al.Jan 20arXiv

AgentEHR is a new, realistic test that asks AI agents to read messy hospital records and make full clinical decisions, not just look up facts.

#AgentEHR#RETROSUM#retrospective summarization

FutureOmni: Evaluating Future Forecasting from Omni-Modal Context for Multimodal LLMs

Intermediate
Qian Chen, Jinlan Fu et al.Jan 20arXiv

FutureOmni is the first benchmark that tests if multimodal AI models can predict what happens next from both sound and video, not just explain what already happened.

#multimodal LLM#audio-visual reasoning#future forecasting

DARC: Decoupled Asymmetric Reasoning Curriculum for LLM Evolution

Intermediate
Shengda Fan, Xuyan Ye et al.Jan 20arXiv

DARC teaches big language models to get smarter by splitting training into two calm, well-organized steps instead of one chaotic loop.

#DARC#self-play#curriculum learning

ChartVerse: Scaling Chart Reasoning via Reliable Programmatic Synthesis from Scratch

Intermediate
Zheng Liu, Honglin Lin et al.Jan 20arXiv

ChartVerse is a new way to make lots of tricky, realistic charts and perfectly checked questions so AI can learn to read charts better.

#Chart reasoning#Vision-Language Models#Rollout Posterior Entropy

Diffusion In Diffusion: Reclaiming Global Coherence in Semi-Autoregressive Diffusion

Intermediate
Linrui Ma, Yufei Cui et al.Jan 20arXiv

The paper proposes Diffusion in Diffusion, a draft-then-revise method that brings back global coherence to fast, block-based diffusion language models.

#discrete diffusion#block diffusion#semi-autoregressive

Behavior Knowledge Merge in Reinforced Agentic Models

Intermediate
Xiangchi Yuan, Dachuan Shi et al.Jan 20arXiv

The paper solves a big problem: when you merge several reinforcement-learned models, their special skills get watered down by simple averaging.

#reinforcement learning#model merging#task vectors

A BERTology View of LLM Orchestrations: Token- and Layer-Selective Probes for Efficient Single-Pass Classification

Intermediate
Gonzalo Ariel Meyoyan, Luciano Del CorroJan 19arXiv

This paper shows how to add a tiny helper (a probe) to a big language model so it can classify things like safety or sentiment during the same pass it already does to answer you.

#LLM orchestration#single-pass classification#hidden-state probing

Aligning Agentic World Models via Knowledgeable Experience Learning

Intermediate
Baochang Ren, Yunzhi Yao et al.Jan 19arXiv

WorldMind teaches AI agents to learn the rules of the real world while they act, instead of cramming everything into fixed model weights.

#agentic world models#predictive coding#physical hallucinations

Typhoon ASR Real-time: FastConformer-Transducer for Thai Automatic Speech Recognition

Intermediate
Warit Sirichotedumrong, Adisai Na-Thalang et al.Jan 19arXiv

Big models like Whisper are great for accuracy but too slow for live captions; this paper builds a smaller, faster Thai speech recognizer for real-time use.

#Thai ASR#Streaming speech recognition#FastConformer-Transducer

Being-H0.5: Scaling Human-Centric Robot Learning for Cross-Embodiment Generalization

Intermediate
Hao Luo, Ye Wang et al.Jan 19arXiv

Being-H0.5 is a robot brain that learns from huge amounts of human videos and robot demos so it can work on many different robots, not just one.

#Vision-Language-Action model#Unified Action Space#Human-centric learning
2324252627