🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers924

AllBeginnerIntermediateAdvanced
All SourcesarXiv

Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale

Beginner
Yi Liu, Weizhe Wang et al.Jan 15arXiv

Agent skills are like apps for AI helpers, but many of them are not carefully checked for safety yet.

#agent skills#AI security#prompt injection

Think-Then-Generate: Reasoning-Aware Text-to-Image Diffusion with LLM Encoders

Intermediate
Siqi Kou, Jiachun Jin et al.Jan 15arXiv

Most text-to-image models act like word-to-pixel copy machines and miss the hidden meaning in our prompts.

#think-then-generate#reasoning-aware text-to-image#LLM encoder

DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset

Intermediate
Hengyu Shen, Tiancheng Gu et al.Jan 15arXiv

DanQing is a fresh, 100-million-pair Chinese image–text dataset collected from 2024–2025 web pages and carefully cleaned for training AI that understands pictures and Chinese text together.

#DanQing#Chinese vision-language dataset#image-text pairs

PRL: Process Reward Learning Improves LLMs' Reasoning Ability and Broadens the Reasoning Boundary

Intermediate
Jiarui Yao, Ruida Wang et al.Jan 15arXiv

Large language models usually get only a final thumbs-up or thumbs-down at the end of their answer, which is too late to fix mistakes made in the middle.

#Process Reward Learning#PRL#Reasoning LLMs

ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback

Intermediate
Yutao Mou, Zhangchi Xue et al.Jan 15arXiv

ToolSafe is a new way to keep AI agents safe when they use external tools, by checking each action before it runs.

#step-level safety#tool invocation#LLM agents

M^4olGen: Multi-Agent, Multi-Stage Molecular Generation under Precise Multi-Property Constraints

Intermediate
Yizhan Li, Florence Cloutier et al.Jan 15arXiv

The paper introduces M^4olGen, a two-stage system that designs new molecules to match exact numbers for several properties (like QED, LogP, MW, HOMO, LUMO) at the same time.

#molecular generation#multi-property optimization#fragment-level editing

LaViT: Aligning Latent Visual Thoughts for Multi-modal Reasoning

Intermediate
Linquan Wu, Tianxiang Jiang et al.Jan 15arXiv

LaViT is a new way to teach smaller vision-language models to look at the right parts of an image before they speak.

#multimodal reasoning#visual attention#knowledge distillation

SIN-Bench: Tracing Native Evidence Chains in Long-Context Multimodal Scientific Interleaved Literature

Intermediate
Yiming Ren, Junjie Wang et al.Jan 15arXiv

The paper introduces SIN-Bench, a new way to test AI that read long scientific papers by forcing them to show exactly where their answers come from.

#multimodal large language models#long-context reasoning#evidence chains

FlowAct-R1: Towards Interactive Humanoid Video Generation

Intermediate
Lizhen Wang, Yongming Zhu et al.Jan 15arXiv

FlowAct-R1 is a new system that makes lifelike human videos in real time, so the on-screen person can react quickly as you talk to them.

#interactive humanoid video#real-time streaming generation#temporal consistency

Deriving Character Logic from Storyline as Codified Decision Trees

Beginner
Letian Peng, Kun Zhou et al.Jan 15arXiv

The paper turns messy character descriptions from stories into neat, executable rules so role‑playing AIs act like the character in each specific scene.

#role‑playing agents#behavioral profiles#codified decision trees

CoF-T2I: Video Models as Pure Visual Reasoners for Text-to-Image Generation

Intermediate
Chengzhuo Tong, Mingkun Chang et al.Jan 15arXiv

This paper turns a video model into a step-by-step visual thinker that makes one final, high-quality picture from a text prompt.

#Chain-of-Frame#visual reasoning#text-to-image

Transition Matching Distillation for Fast Video Generation

Intermediate
Weili Nie, Julius Berner et al.Jan 14arXiv

Big video makers (diffusion models) create great videos but are too slow because they use hundreds of tiny clean-up steps.

#video diffusion#distillation#transition matching
3132333435