🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers943

AllBeginnerIntermediateAdvanced
All SourcesarXiv

Insight Miner: A Time Series Analysis Dataset for Cross-Domain Alignment with Natural Language

Intermediate
Yunkai Zhang, Yawen Zhang et al.Dec 12arXiv

Time-series data are numbers tracked over time, like temperature each hour or traffic each day, and turning them into clear words usually needs experts.

#time series#multimodal model#trend description

Causal Judge Evaluation: Calibrated Surrogate Metrics for LLM Systems

Intermediate
Eddie Landesberg, Manjari NarayanDec 11arXiv

LLM judges are cheap but biased; without calibration they can completely flip which model looks best.

#LLM-as-judge#calibration#isotonic regression

Fast-FoundationStereo: Real-Time Zero-Shot Stereo Matching

Intermediate
Bowen Wen, Shaurya Dewan et al.Dec 11arXiv

Fast-FoundationStereo is a stereo vision system that sees depth from two cameras in real time while still working well on brand‑new scenes it was never trained on.

#stereo matching#zero‑shot generalization#knowledge distillation

StereoSpace: Depth-Free Synthesis of Stereo Geometry via End-to-End Diffusion in a Canonical Space

Intermediate
Tjark Behrens, Anton Obukhov et al.Dec 11arXiv

StereoSpace turns a single photo into a full 3D-style stereo pair without ever estimating a depth map.

#stereo generation#monocular-to-stereo#diffusion models

Omni-Attribute: Open-vocabulary Attribute Encoder for Visual Concept Personalization

Intermediate
Tsai-Shien Chen, Aliaksandr Siarohin et al.Dec 11arXiv

Omni-Attribute is a new image encoder that learns just the parts of a picture you ask for (like hairstyle or lighting) and ignores the rest.

#open-vocabulary attribute encoder#attribute disentanglement#visual concept personalization

Bidirectional Normalizing Flow: From Data to Noise and Back

Intermediate
Yiyang Lu, Qiao Sun et al.Dec 11arXiv

Normalizing Flows are models that learn how to turn real images into simple noise and then back again.

#Normalizing Flow#Bidirectional Normalizing Flow#Hidden Alignment

Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation

Intermediate
Yiwen Tang, Zoey Guo et al.Dec 11arXiv

This paper asks whether reinforcement learning (RL) can improve making 3D models from text and shows that the answer is yes if we design the training and rewards carefully.

#Reinforcement Learning#Text-to-3D Generation#Hi-GRPO

Stronger Normalization-Free Transformers

Intermediate
Mingzhi Chen, Taiming Lu et al.Dec 11arXiv

This paper shows that we can remove normalization layers from Transformers and still train them well by using a simple point‑by‑point function called Derf.

#Normalization‑free Transformers#LayerNorm replacement#Point‑wise activation

FoundationMotion: Auto-Labeling and Reasoning about Spatial Movement in Videos

Intermediate
Yulu Gan, Ligeng Zhu et al.Dec 11arXiv

FoundationMotion is a fully automatic pipeline that turns raw videos into detailed motion data, captions, and quizzes about how things move.

#motion understanding#spatio-temporal reasoning#video question answering

DuetSVG: Unified Multimodal SVG Generation with Internal Visual Guidance

Intermediate
Peiying Zhang, Nanxuan Zhao et al.Dec 11arXiv

DuetSVG is a new AI that learns to make SVG graphics by generating an image and the matching SVG code together, like sketching first and then tracing neatly.

#DuetSVG#multimodal generation#SVG generation

MoCapAnything: Unified 3D Motion Capture for Arbitrary Skeletons from Monocular Videos

Intermediate
Kehong Gong, Zhengyu Wen et al.Dec 11arXiv

MoCapAnything is a system that turns a single regular video into a 3D animation that can drive any rigged character, not just humans or one animal type.

#motion capture#category-agnostic mocap#monocular video

From Macro to Micro: Benchmarking Microscopic Spatial Intelligence on Molecules via Vision-Language Models

Intermediate
Zongzhao Li, Xiangzhe Kong et al.Dec 11arXiv

The paper defines Microscopic Spatial Intelligence (MiSI) as the skill AI needs to understand tiny 3D things like molecules from 2D pictures and text, just like scientists do.

#microscopic spatial intelligence#vision-language models#orthographic projection
6970717273