🎓How I Study AIHISA
đź“–Read
📄Papers📰Blogs🎬Courses
đź’ˇLearn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers807

AllBeginnerIntermediateAdvanced
All SourcesarXiv

DentalGPT: Incentivizing Multimodal Complex Reasoning in Dentistry

Intermediate
Zhenyang Cai, Jiaming Zhang et al.Dec 12arXiv

DentalGPT is a special AI that looks at dental images and text together and explains what it sees like a junior dentist.

#DentalGPT#multimodal large language model#dentistry AI

Rethinking Expert Trajectory Utilization in LLM Post-training

Intermediate
Bowen Ding, Yuhan Chen et al.Dec 12arXiv

The paper asks how to best use expert step-by-step solutions (expert trajectories) when teaching big AI models to reason after pretraining.

#Supervised Fine-Tuning#Reinforcement Learning#Expert Trajectories

Exploring MLLM-Diffusion Information Transfer with MetaCanvas

Intermediate
Han Lin, Xichen Pan et al.Dec 12arXiv

MetaCanvas lets a multimodal language model (MLLM) sketch a plan inside the generator’s hidden canvas so diffusion models can follow it patch by patch.

#MetaCanvas#MLLM#Diffusion Transformer

An Anatomy of Vision-Language-Action Models: From Modules to Milestones and Challenges

Intermediate
Chao Xu, Suyu Zhang et al.Dec 12arXiv

Vision-Language-Action (VLA) models are robots’ “see–think–do” brains that connect cameras (vision), words (language), and motors (action).

#Vision-Language-Action#Embodied AI#Multimodal Alignment

Insight Miner: A Time Series Analysis Dataset for Cross-Domain Alignment with Natural Language

Intermediate
Yunkai Zhang, Yawen Zhang et al.Dec 12arXiv

Time-series data are numbers tracked over time, like temperature each hour or traffic each day, and turning them into clear words usually needs experts.

#time series#multimodal model#trend description

Causal Judge Evaluation: Calibrated Surrogate Metrics for LLM Systems

Intermediate
Eddie Landesberg, Manjari NarayanDec 11arXiv

LLM judges are cheap but biased; without calibration they can completely flip which model looks best.

#LLM-as-judge#calibration#isotonic regression

Fast-FoundationStereo: Real-Time Zero-Shot Stereo Matching

Intermediate
Bowen Wen, Shaurya Dewan et al.Dec 11arXiv

Fast-FoundationStereo is a stereo vision system that sees depth from two cameras in real time while still working well on brand‑new scenes it was never trained on.

#stereo matching#zero‑shot generalization#knowledge distillation

StereoSpace: Depth-Free Synthesis of Stereo Geometry via End-to-End Diffusion in a Canonical Space

Intermediate
Tjark Behrens, Anton Obukhov et al.Dec 11arXiv

StereoSpace turns a single photo into a full 3D-style stereo pair without ever estimating a depth map.

#stereo generation#monocular-to-stereo#diffusion models

Omni-Attribute: Open-vocabulary Attribute Encoder for Visual Concept Personalization

Intermediate
Tsai-Shien Chen, Aliaksandr Siarohin et al.Dec 11arXiv

Omni-Attribute is a new image encoder that learns just the parts of a picture you ask for (like hairstyle or lighting) and ignores the rest.

#open-vocabulary attribute encoder#attribute disentanglement#visual concept personalization

Bidirectional Normalizing Flow: From Data to Noise and Back

Intermediate
Yiyang Lu, Qiao Sun et al.Dec 11arXiv

Normalizing Flows are models that learn how to turn real images into simple noise and then back again.

#Normalizing Flow#Bidirectional Normalizing Flow#Hidden Alignment

Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation

Intermediate
Yiwen Tang, Zoey Guo et al.Dec 11arXiv

This paper asks whether reinforcement learning (RL) can improve making 3D models from text and shows that the answer is yes if we design the training and rewards carefully.

#Reinforcement Learning#Text-to-3D Generation#Hi-GRPO

Stronger Normalization-Free Transformers

Intermediate
Mingzhi Chen, Taiming Lu et al.Dec 11arXiv

This paper shows that we can remove normalization layers from Transformers and still train them well by using a simple point‑by‑point function called Derf.

#Normalization‑free Transformers#LayerNorm replacement#Point‑wise activation
5960616263