🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers43

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#supervised fine-tuning

Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model

Intermediate
Team Seedance, Heyi Chen et al.Dec 15arXiv

Seedance 1.5 pro is a single model that makes video and sound together at the same time, so lips, music, and actions match naturally.

#audio-visual generation#diffusion transformer#cross-modal synchronization

ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement

Intermediate
Zhihang Liu, Xiaoyi Bao et al.Dec 15arXiv

ShowTable is a new way for AI to turn a data table into a beautiful, accurate infographic using a think–make–check–fix loop.

#creative table visualization#multimodal large language model#diffusion model

From Macro to Micro: Benchmarking Microscopic Spatial Intelligence on Molecules via Vision-Language Models

Intermediate
Zongzhao Li, Xiangzhe Kong et al.Dec 11arXiv

The paper defines Microscopic Spatial Intelligence (MiSI) as the skill AI needs to understand tiny 3D things like molecules from 2D pictures and text, just like scientists do.

#microscopic spatial intelligence#vision-language models#orthographic projection

MentraSuite: Post-Training Large Language Models for Mental Health Reasoning and Assessment

Intermediate
Mengxi Xiao, Kailai Yang et al.Dec 10arXiv

MentraSuite is a complete toolkit that teaches large language models (LLMs) to reason about mental health step by step, not just sound caring.

#mental health reasoning#LLM post-training#supervised fine-tuning

LLaDA2.0: Scaling Up Diffusion Language Models to 100B

Intermediate
Tiwei Bie, Maosong Cao et al.Dec 10arXiv

Before this work, most big language models talked one word at a time (autoregressive), which made them slow and hard to parallelize.

#diffusion language model#masked diffusion#block diffusion

EditThinker: Unlocking Iterative Reasoning for Any Image Editor

Intermediate
Hongyu Li, Manyuan Zhang et al.Dec 5arXiv

EditThinker is a helper brain for any image editor that thinks, checks, and rewrites the instruction in multiple rounds until the picture looks right.

#instruction-based image editing#iterative reasoning#multimodal large language model

COOPER: A Unified Model for Cooperative Perception and Reasoning in Spatial Intelligence

Beginner
Zefeng Zhang, Xiangzhao Hao et al.Dec 4arXiv

COOPER is a single AI model that both “looks better” (perceives depth and object boundaries) and “thinks smarter” (reasons step by step) to answer spatial questions about images.

#COOPER#multimodal large language model#unified model
1234