🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers915

AllBeginnerIntermediateAdvanced
All SourcesarXiv

DreamActor-M2: Universal Character Image Animation via Spatiotemporal In-Context Learning

Intermediate
Mingshuang Luo, Shuang Liang et al.Jan 29arXiv

DreamActor-M2 is a new way to make a still picture move by copying motion from a video while keeping the character’s look the same.

#character image animation#spatiotemporal in-context learning#video diffusion

OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models

Intermediate
Yufeng Zhong, Lei Chen et al.Jan 29arXiv

OCRVerse is a new AI model that can read both plain text in documents and the visual structures in charts, webpages, and science plots, all in one system.

#Holistic OCR#Vision-Language Model#Supervised Fine-Tuning

Beyond Imitation: Reinforcement Learning for Active Latent Planning

Intermediate
Zhi Zheng, Wee Sun LeeJan 29arXiv

The paper shows how to make AI think faster and smarter by planning in a hidden space instead of writing long step-by-step sentences.

#latent reasoning#chain-of-thought#variational autoencoder

Scalable Power Sampling: Unlocking Efficient, Training-Free Reasoning for LLMs via Distribution Sharpening

Intermediate
Xiaotong Ji, Rasul Tutunov et al.Jan 29arXiv

The paper shows a fast, training-free way to boost an LLM’s step-by-step reasoning by smartly reusing the model’s own probabilities.

#power distribution sampling#distribution sharpening#low-temperature sampling

KromHC: Manifold-Constrained Hyper-Connections with Kronecker-Product Residual Matrices

Intermediate
Wuyang Zhou, Yuxuan Gu et al.Jan 29arXiv

Hyper-Connections (HC) make the usual single shortcut in neural networks wider by creating several parallel streams and letting the model mix them, but this can become unstable when stacked deep.

#Hyper-Connections#Manifold-Constrained Hyper-Connections#Doubly Stochastic Matrix

Shaping capabilities with token-level data filtering

Intermediate
Neil Rathi, Alec RadfordJan 29arXiv

The paper shows a simple way to teach AI models what not to learn by removing only the exact words (tokens) related to unwanted topics during pretraining.

#token-level data filtering#capability shaping#sparse autoencoders

ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas

Intermediate
Xiaoyu Tian, Haotian Wang et al.Jan 29arXiv

ASTRA is a fully automated way to train tool-using AI agents by making both their practice stories (trajectories) and their practice worlds (environments) without humans in the loop.

#tool-augmented agents#multi-turn decision making#verifiable environments

MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning

Intermediate
Yaorui Shi, Shugui Liu et al.Jan 29arXiv

MemOCR is a new way for AI to remember long histories by turning important notes into a picture with big, bold parts for key facts and tiny parts for details.

#MemOCR#visual memory#adaptive information density

ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation

Beginner
Zihao Huang, Jundong Zhou et al.Jan 29arXiv

ConceptMoE teaches a language model to group easy, similar tokens into bigger ideas called concepts, so it spends more brainpower on the hard parts.

#ConceptMoE#Mixture of Experts#Adaptive Compression

Latent Chain-of-Thought as Planning: Decoupling Reasoning from Verbalization

Intermediate
Jiecong Wang, Hao Peng et al.Jan 29arXiv

This paper introduces PLaT, a way for AI to think silently in a hidden space (the brain) and only speak when needed (the mouth).

#latent chain-of-thought#planning in latent space#planner-decoder architecture

Self-Improving Pretraining: using post-trained models to pretrain better models

Intermediate
Ellen Xiaoqing Tan, Shehzaad Dhuliawala et al.Jan 29arXiv

This paper teaches language models to be safer, more factual, and higher quality during pretraining, not just after, by using reinforcement learning with a stronger model as a helper.

#self-improving pretraining#reinforcement learning#online DPO

Qwen3-ASR Technical Report

Intermediate
Xian Shi, Xiong Wang et al.Jan 29arXiv

Qwen3‑ASR is a family of speech models that hear, understand, and write down speech in 52 languages and dialects, plus they can tell you when each word was spoken.

#ASR#forced alignment#timestamps
1415161718