🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers924

AllBeginnerIntermediateAdvanced
All SourcesarXiv

Robust Tool Use via Fission-GRPO: Learning to Recover from Execution Errors

Beginner
Zhiwei Zhang, Fei Zhao et al.Jan 22arXiv

Small AI models often stumble when a tool call fails and then get stuck repeating bad calls instead of fixing the mistake.

#FISSION-GRPO#error recovery#tool use

Qwen3-TTS Technical Report

Intermediate
Hangrui Hu, Xinfa Zhu et al.Jan 22arXiv

Qwen3-TTS is a family of text-to-speech models that can talk in 10+ languages, clone a new voice from just 3 seconds, and follow detailed style instructions in real time.

#Qwen3-TTS#text-to-speech#voice cloning

Rethinking Video Generation Model for the Embodied World

Beginner
Yufan Deng, Zilin Pan et al.Jan 21arXiv

Robots need videos that not only look pretty but also follow real-world physics and finish the task asked of them.

#robot video generation#embodied AI#benchmark

OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation

Intermediate
Letian Zhang, Sucheng Ren et al.Jan 21arXiv

OpenVision 3 is a single vision encoder that learns one set of image tokens that work well for both understanding images (like answering questions) and generating images (like making new pictures).

#Unified Visual Encoder#VAE#Vision Transformer

PROGRESSLM: Towards Progress Reasoning in Vision-Language Models

Intermediate
Jianshu Zhang, Chengxuan Qian et al.Jan 21arXiv

This paper asks a new question for vision-language models: not just 'What do you see?' but 'How far along is the task right now?'

#progress reasoning#vision-language models#episodic retrieval

Privacy Collapse: Benign Fine-Tuning Can Break Contextual Privacy in Language Models

Intermediate
Anmol Goel, Cornelius Emde et al.Jan 21arXiv

Benign fine-tuning meant to make language models more helpful can accidentally make them overshare private information.

#contextual privacy#privacy collapse#fine-tuning

LangForce: Bayesian Decomposition of Vision Language Action Models via Latent Action Queries

Intermediate
Shijie Lian, Bin Yu et al.Jan 21arXiv

Robots often learn a bad habit called the vision shortcut: they guess the task just by looking, and ignore the words you tell them.

#Vision-Language-Action#Bayesian decomposition#Latent Action Queries

The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models

Beginner
Zanlin Ni, Shenzhi Wang et al.Jan 21arXiv

Diffusion language models can write tokens in any order, but that freedom can accidentally hurt their ability to reason well.

#diffusion language model#arbitrary order generation#autoregressive training

Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning

Intermediate
Yifan Wang, Shiyu Li et al.Jan 21arXiv

Render-of-Thought (RoT) turns the model’s step-by-step thinking from long text into slim images so the model can think faster with fewer tokens.

#Render-of-Thought#Chain-of-Thought#Latent Reasoning

HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding

Intermediate
Haowei Zhang, Shudong Yang et al.Jan 21arXiv

HERMES is a training-free way to make video-language models understand live, streaming video quickly and accurately.

#HERMES#KV cache#hierarchical memory

Typhoon OCR: Open Vision-Language Model For Thai Document Extraction

Beginner
Surapon Nonesung, Natapong Nitarach et al.Jan 21arXiv

Typhoon OCR is an open, lightweight vision-language model that reads Thai and English documents and returns clean, structured text.

#Thai OCR#Vision-Language Model#Document Layout Reconstruction

FARE: Fast-Slow Agentic Robotic Exploration

Beginner
Shuhao Liao, Xuxin Lv et al.Jan 21arXiv

Robots used to explore by following simple rules or short-term rewards, which often made them waste time and backtrack a lot.

#autonomous exploration#fast-slow thinking#hierarchical planning
2425262728