🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers1055

AllBeginnerIntermediateAdvanced
All SourcesarXiv

Revisiting the Platonic Representation Hypothesis: An Aristotelian View

Intermediate
Fabian Gröger, Shuo Wen et al.Feb 16arXiv

People thought big AI models were all learning the same overall picture of the world, but those measurements were secretly biased by model size and depth.

#representational similarity#Centered Kernel Alignment (CKA)#mutual k-Nearest Neighbors (mKNN)

Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report v1.5

Intermediate
Dongrui Liu, Yi Yu et al.Feb 16arXiv

This report studies the biggest new dangers from super-capable AI and tests them in realistic, well-controlled labs so we can fix problems before they cause real harm.

#frontier AI#agentic AI#cyber offense

Does Socialization Emerge in AI Agent Society? A Case Study of Moltbook

Intermediate
Ming Li, Xirui Li et al.Feb 15arXiv

This paper studies Moltbook, a giant social network made only of AI agents, to see if they start acting like a real society over time.

#AI socialization#multi-agent systems#Moltbook

AutoWebWorld: Synthesizing Infinite Verifiable Web Environments via Finite State Machines

Intermediate
Yifan Wu, Yiran Peng et al.Feb 15arXiv

AutoWebWorld builds pretend websites with clear rules so AI can practice safely and be checked automatically.

#Finite State Machine#Web GUI Agents#Synthetic Data Generation

Sanity Checks for Sparse Autoencoders: Do SAEs Beat Random Baselines?

Intermediate
Anton Korznikov, Andrey Galichin et al.Feb 15arXiv

Sparse autoencoders (SAEs) are popular for explaining what large language models are doing, but this paper shows they often don’t learn real, meaningful features.

#sparse autoencoders#interpretability#dictionary learning

Empty Shelves or Lost Keys? Recall Is the Bottleneck for Parametric Factuality

Intermediate
Nitay Calderon, Eyal Ben-David et al.Feb 15arXiv

Not all wrong answers from large language models (LLMs) mean they never learned the fact—many times the model knows it but can’t pull it out on demand.

#LLM factuality#encoding vs recall#knowledge profiling

Mobile-Agent-v3.5: Multi-platform Fundamental GUI Agents

Intermediate
Haiyang Xu, Xi Zhang et al.Feb 15arXiv

This paper builds GUI-Owl-1.5, an AI that can use phones, computers, and web browsers like a careful human helper.

#GUI agent#visual grounding#reinforcement learning

Experiential Reinforcement Learning

Intermediate
Taiwei Shi, Sihao Chen et al.Feb 15arXiv

This paper teaches AI models to learn like good students: try, think about what went wrong, fix it, and remember the fix.

#Experiential Reinforcement Learning#self-reflection#distillation

SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuning

Intermediate
Jintao Zhang, Kai Jiang et al.Feb 13arXiv

Video generators are slow because attention looks at everything, which takes a lot of time.

#sparse attention#Top-k masking#Top-p masking

SLA2: Sparse-Linear Attention with Learnable Routing and QAT

Intermediate
Jintao Zhang, Haoxu Wang et al.Feb 13arXiv

SLA2 is a new way for AI to pay attention faster by smartly splitting work between two helpers: a precise one (sparse attention) and a speedy one (linear attention).

#Sparse Attention#Linear Attention#SLA2

SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks

Intermediate
Xiangyi Li, Wenbo Chen et al.Feb 13arXiv

SkillsBench is a big test playground that measures whether giving AI agents step-by-step 'Skills' actually helps them finish real tasks.

#Agent Skills#LLM agents#Benchmarking

UniT: Unified Multimodal Chain-of-Thought Test-time Scaling

Intermediate
Leon Liangyu Chen, Haoyu Ma et al.Feb 12arXiv

UniT teaches one multimodal model to think in steps with pictures and words, so it can check its own work and fix mistakes as it goes.

#Unified multimodal model#Chain-of-thought#Test-time scaling
1314151617