🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers776

AllBeginnerIntermediateAdvanced
All SourcesarXiv

ECO: Quantized Training without Full-Precision Master Weights

Intermediate
Mahdi Nikdan, Amir Zandieh et al.Jan 29arXiv

Training big AI models uses lots of memory because most methods still keep a secret full-precision copy of the weights called master weights.

#quantized training#master weights#error feedback

Latent Adversarial Regularization for Offline Preference Optimization

Intermediate
Enyi Jiang, Yibo Jacky Zhang et al.Jan 29arXiv

This paper introduces GANPO, a new way to train language models from human preferences by guiding the model using its hidden thoughts (latent space) instead of just its visible words (token space).

#GANPO#latent space regularization#offline preference optimization

VTC-R1: Vision-Text Compression for Efficient Long-Context Reasoning

Intermediate
Yibo Wang, Yongcheng Jing et al.Jan 29arXiv

This paper shows a new way to help AI think through long problems faster by turning earlier text steps into small pictures the AI can reread.

#vision-text compression#optical memory#iterative reasoning

Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models

Intermediate
Wenxuan Huang, Yu Zeng et al.Jan 29arXiv

The paper tackles a real problem: one-shot image or text searches often miss the right evidence (low hit-rate), especially in noisy, cluttered pictures.

#multimodal deep research#visual question answering#ReAct reasoning

MetricAnything: Scaling Metric Depth Pretraining with Noisy Heterogeneous Sources

Intermediate
Baorui Ma, Jiahui Yang et al.Jan 29arXiv

Metric Anything is a new way to teach AI real, ruler-like distances (metric depth) from very mixed and noisy 3D data.

#metric depth estimation#sparse metric prompt#monocular depth

PLANING: A Loosely Coupled Triangle-Gaussian Framework for Streaming 3D Reconstruction

Intermediate
Changjian Jiang, Kerui Ren et al.Jan 29arXiv

PLANING is a new way to build 3D worlds from a moving single camera by combining two kinds of pieces: sharp triangles for shape and soft Gaussians for looks.

#Streaming 3D Reconstruction#Triangle Primitives#Neural Gaussians

CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty

Intermediate
Johannes Kirmayr, Lukas Stappen et al.Jan 29arXiv

CAR-bench is a new 'driving test' for AI assistants that checks if they can stay careful, honest, and consistent during real back-and-forth conversations in a car.

#LLM agents#benchmarking#consistency

Causal World Modeling for Robot Control

Intermediate
Lin Li, Qihang Zhang et al.Jan 29arXiv

Robots used to copy actions from videos without truly understanding how the world changes, so they often messed up long, multi-step jobs.

#robot world model#autoregressive diffusion#causal masking

Retrieval-Infused Reasoning Sandbox: A Benchmark for Decoupling Retrieval and Reasoning Capabilities

Intermediate
Shuangshuang Ying, Zheyu Wang et al.Jan 29arXiv

This paper builds a safe science “playground” called DeR that fairly tests how AI finds facts (retrieval) and how it thinks with those facts (reasoning) without mixing them up.

#retrieval-augmented generation#document-grounded reasoning#deep research benchmark

MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric Methods

Intermediate
Honglin Lin, Zheng Liu et al.Jan 29arXiv

MMFineReason is a huge, open dataset (1.8 million examples, 5.1 billion solution tokens) that teaches AIs to think step by step about pictures and text together.

#multimodal reasoning#vision-language models#chain-of-thought

Language-based Trial and Error Falls Behind in the Era of Experience

Intermediate
Haoyu Wang, Guozheng Ma et al.Jan 29arXiv

Big language models are great at words but waste lots of time and energy when they try random actions in non-language games like Sudoku, Sokoban, 2048, FrozenLake, and Rubik’s Cube.

#SCOUT#Reinforcement Learning#Supervised Fine-Tuning

DreamActor-M2: Universal Character Image Animation via Spatiotemporal In-Context Learning

Intermediate
Mingshuang Luo, Shuang Liang et al.Jan 29arXiv

DreamActor-M2 is a new way to make a still picture move by copying motion from a video while keeping the character’s look the same.

#character image animation#spatiotemporal in-context learning#video diffusion
1112131415