🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers807

AllBeginnerIntermediateAdvanced
All SourcesarXiv

SenseNova-MARS: Empowering Multimodal Agentic Reasoning and Search via Reinforcement Learning

Intermediate
Yong Xien Chng, Tao Hu et al.Dec 30arXiv

SenseNova-MARS is a vision-language model that can think step-by-step and use three tools—text search, image search, and image cropping—during its reasoning.

#multimodal agent#vision-language model#reinforcement learning

Figure It Out: Improve the Frontier of Reasoning with Executable Visual States

Intermediate
Meiqi Chen, Fandong Meng et al.Dec 30arXiv

FIGR is a new way for AI to ‘think by drawing,’ using code to build clean, editable diagrams while it reasons.

#executable visual states#diagrammatic reasoning#reinforcement learning for reasoning

Taming Hallucinations: Boosting MLLMs' Video Understanding via Counterfactual Video Generation

Intermediate
Zhe Huang, Hao Wen et al.Dec 30arXiv

Multimodal Large Language Models (MLLMs) often hallucinate on videos by trusting words and common sense more than what the frames really show.

#multimodal large language model#video understanding#visual hallucination

GR-Dexter Technical Report

Intermediate
Ruoshi Wen, Guangzeng Chen et al.Dec 30arXiv

GR-Dexter is a full package—new robot hands, a smart AI brain, and lots of carefully mixed data—that lets a two-handed robot follow language instructions to do long, tricky tasks.

#vision-language-action#dexterous manipulation#bimanual robotics

GARDO: Reinforcing Diffusion Models without Reward Hacking

Intermediate
Haoran He, Yuxiao Ye et al.Dec 30arXiv

GARDO is a new way to fine-tune text-to-image diffusion models with reinforcement learning without getting tricked by bad reward signals.

#GARDO#reward hacking#gated KL regularization

Factorized Learning for Temporally Grounded Video-Language Models

Intermediate
Wenzheng Zeng, Difei Gao et al.Dec 30arXiv

This paper teaches video-language models to first find when the proof happens in a video and then answer with that proof, instead of mixing both steps together.

#temporal grounding#video-language models#evidence tokens

Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling

Intermediate
Chulun Zhou, Chunkang Zhang et al.Dec 30arXiv

Multi-step RAG systems often struggle with long documents because their memory is just a pile of isolated facts, not a connected understanding.

#multi-step RAG#hypergraph memory#hyperedge merging

Pretraining Frame Preservation in Autoregressive Video Memory Compression

Intermediate
Lvmin Zhang, Shengqu Cai et al.Dec 29arXiv

The paper teaches a video model to squeeze long video history into a tiny memory while still keeping sharp details in single frames.

#autoregressive video generation#video memory compression#frame retrieval pretraining

Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion

Intermediate
Hau-Shiang Shiu, Chin-Yang Lin et al.Dec 29arXiv

This paper makes diffusion-based video super-resolution (VSR) practical for live, low-latency use by removing the need for future frames and cutting denoising from ~50 steps down to just 4.

#video super-resolution#diffusion model#latent diffusion

Training AI Co-Scientists Using Rubric Rewards

Intermediate
Shashwat Goel, Rishi Hazra et al.Dec 29arXiv

The paper teaches AI to write strong research plans by letting it grade its own work using checklists (rubrics) pulled from real scientific papers.

#AI co-scientist#research plan generation#rubric rewards

Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation

Intermediate
Shaocong Xu, Songlin Wei et al.Dec 29arXiv

Transparent and shiny objects confuse normal depth cameras, but video diffusion models already learned how light bends and reflects through them.

#video diffusion model#transparent object depth#normal estimation

Web World Models

Intermediate
Jichen Feng, Yifan Zhang et al.Dec 29arXiv

This paper introduces Web World Models (WWMs), a way to build huge, explorable worlds by putting strict rules in code and letting AI write the fun details.

#Web World Model#typed interfaces#deterministic hashing
4243444546