🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers943

AllBeginnerIntermediateAdvanced
All SourcesarXiv

MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence

Intermediate
Jingli Lin, Runsen Xu et al.Dec 11arXiv

This paper introduces MMSI-Video-Bench, a big, carefully hand-made test to check how well AI understands space and motion in videos.

#video-based spatial intelligence#multimodal large language models#spatial construction

Scaling Behavior of Discrete Diffusion Language Models

Intermediate
Dimitri von Rütte, Janis Fluri et al.Dec 11arXiv

This paper studies how a newer kind of language model, called a discrete diffusion language model (DLM), gets better as we give it more data, bigger models, and more compute.

#discrete diffusion#language models#scaling laws

What matters for Representation Alignment: Global Information or Spatial Structure?

Intermediate
Jaskirat Singh, Xingjian Leng et al.Dec 11arXiv

This paper asks whether generation training benefits more from an encoder’s big-picture meaning (global semantics) or from how features are arranged across space (spatial structure).

#representation alignment#REPA#iREPA

The FACTS Leaderboard: A Comprehensive Benchmark for Large Language Model Factuality

Beginner
Aileen Cheng, Alon Jacovi et al.Dec 11arXiv

The FACTS Leaderboard is a four-part test that checks how truthful AI models are across images, memory, web search, and document grounding.

#LLM factuality#benchmarking#multimodal evaluation

OPV: Outcome-based Process Verifier for Efficient Long Chain-of-Thought Verification

Intermediate
Zijian Wu, Lingkai Kong et al.Dec 11arXiv

Big AI models often write very long step-by-step solutions, but usual checkers either only check the final answer or get lost in the long steps.

#Outcome-based Process Verifier#Chain-of-Thought#Process Verification

Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving

Intermediate
Songyang Gao, Yuzhe Gu et al.Dec 11arXiv

This paper builds a math problem–solving agent, Intern-S1-MO, that thinks in multiple rounds and remembers proven mini-results called lemmas so it can solve very long, Olympiad-level problems.

#long-horizon reasoning#lemma-based memory#multi-agent reasoning

Sharp Monocular View Synthesis in Less Than a Second

Beginner
Lars Mescheder, Wei Dong et al.Dec 11arXiv

SHARP turns a single photo into a 3D scene you can look around in, and it does this in under one second on a single GPU.

#monocular view synthesis#3D Gaussians#real-time neural rendering

CAPTAIN: Semantic Feature Injection for Memorization Mitigation in Text-to-Image Diffusion Models

Intermediate
Tong Zhang, Carlos Hinojosa et al.Dec 11arXiv

Diffusion models sometimes copy training images too closely, which can be a privacy and copyright problem.

#diffusion models#memorization mitigation#latent feature injection

LEO-RobotAgent: A General-purpose Robotic Agent for Language-driven Embodied Operator

Intermediate
Lihuang Chen, Xiangyu Luo et al.Dec 11arXiv

LEO-RobotAgent is a simple but powerful framework that lets a language model think, plan, and operate many kinds of robots using natural language.

#LEO-RobotAgent#language-driven robotics#LLM agent

Achieving Olympia-Level Geometry Large Language Model Agent via Complexity Boosting Reinforcement Learning

Intermediate
Haiteng Zhao, Junhao Shen et al.Dec 11arXiv

This paper builds InternGeometry, a large language model agent that solves Olympiad-level geometry by talking to a math engine, remembering what worked, and trying smart new ideas.

#InternGeometry#geometry theorem proving#auxiliary constructions

T-pro 2.0: An Efficient Russian Hybrid-Reasoning Model and Playground

Intermediate
Dmitrii Stoianov, Danil Taranets et al.Dec 11arXiv

T-pro 2.0 is an open Russian language model that can answer quickly or think step by step, so you can pick speed or accuracy when you need it.

#T-pro 2.0#Russian LLM#Hybrid reasoning

SWAA: Sliding Window Attention Adaptation for Efficient Long-Context LLMs Without Pretraining

Intermediate
Yijiong Yu, Jiale Liu et al.Dec 11arXiv

Long texts make standard attention in large language models very slow because it checks every word against every other word.

#Sliding Window Attention#SWAA#FA Decode
7071727374