🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers196

AllBeginnerIntermediateAdvanced
All SourcesarXiv

Voxtral Realtime

Beginner
Alexander H. Liu, Andy Ehrenberg et al.Feb 11arXiv

Voxtral Realtime is a speech-to-text model that types what you say almost instantly, while keeping accuracy close to the best offline systems.

#streaming ASR#real-time transcription#causal audio encoder

Benchmarking Large Language Models for Knowledge Graph Validation

Beginner
Farzad Shami, Stefano Marchesin et al.Feb 11arXiv

Knowledge graphs are like giant fact maps, and keeping every fact correct is hard and important.

#Knowledge Graph Validation#Fact Checking#Large Language Models

LiveMedBench: A Contamination-Free Medical Benchmark for LLMs with Automated Rubric Evaluation

Beginner
Zhiling Yan, Dingjie Song et al.Feb 10arXiv

LiveMedBench is a new, always-updating test for medical AIs that keeps test questions safely separated from training data to avoid cheating by memorization.

#LiveMedBench#medical benchmark#data contamination

When the Prompt Becomes Visual: Vision-Centric Jailbreak Attacks for Large Image Editing Models

Beginner
Jiacheng Hou, Yining Sun et al.Feb 10arXiv

Modern image editors can now follow visual prompts like arrows and scribbles, which opens a new way for attackers to hide harmful instructions inside images.

#vision-centric jailbreak#image editing safety#visual prompts

Effective Reasoning Chains Reduce Intrinsic Dimensionality

Beginner
Archiki Prasad, Mandar Joshi et al.Feb 9arXiv

The paper asks a simple question: which kind of step-by-step reasoning helps small language models learn best, and why?

#intrinsic dimensionality#chain-of-thought#LoRA

SceneSmith: Agentic Generation of Simulation-Ready Indoor Scenes

Beginner
Nicholas Pfaff, Thomas Cohn et al.Feb 9arXiv

SceneSmith is a smart team of AI helpers that turns a short text like 'a cozy study with books and a desk' into a full 3D home scene you can drop right into a robot simulator.

#agentic scene synthesis#text-to-3D generation#indoor scene generation

WorldCompass: Reinforcement Learning for Long-Horizon World Models

Beginner
Zehan Wang, Tengfei Wang et al.Feb 9arXiv

WorldCompass teaches video world models to follow actions better and keep pictures pretty by using reinforcement learning after pretraining.

#world models#reinforcement learning#clip-level rollout

Contact-Anchored Policies: Contact Conditioning Creates Strong Robot Utility Models

Beginner
Zichen Jeff Cui, Omar Rayyan et al.Feb 9arXiv

Robots often get confused by wordy instructions, so this paper tells them exactly where to touch instead of what to do in sentences.

#contact-anchored policies#robot utility models#contact anchor

iGRPO: Self-Feedback-Driven LLM Reasoning

Beginner
Ali Hatamizadeh, Shrimai Prabhumoye et al.Feb 9arXiv

This paper teaches a language model to improve its own math answers by first writing several drafts and then learning to beat its best draft.

#iGRPO#GRPO#Reinforcement Learning

LLaDA2.1: Speeding Up Text Diffusion via Token Editing

Beginner
Tiwei Bie, Maosong Cao et al.Feb 9arXiv

LLaDA2.1 teaches a diffusion-style language model to write fast rough drafts and then fix its own mistakes by editing tokens it already wrote.

#discrete diffusion language model#editable decoding#token-to-token editing

Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition

Beginner
Yuhao Dong, Shulin Tian et al.Feb 9arXiv

This paper teaches AI to learn how-to steps from demonstrations in the moment, the way people do.

#video in-context learning#procedural video understanding#multimodal large language models

NarraScore: Bridging Visual Narrative and Musical Dynamics via Hierarchical Affective Control

Beginner
Yufan Wen, Zhaocheng Liu et al.Feb 9arXiv

NarraScore turns a video's changing story into a matching soundtrack by using emotion as the bridge.

#video-to-music generation#affective computing#valence-arousal
34567