🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers130

AllBeginnerIntermediateAdvanced
All SourcesarXiv

Real2Edit2Real: Generating Robotic Demonstrations via a 3D Control Interface

Beginner
Yujie Zhao, Hongwei Fan et al.Dec 22arXiv

Robots learn better when they see many examples, but collecting lots of real videos is slow and expensive.

#robotic demonstration generation#depth-controlled video generation#metric-scale 3D reconstruction

MemEvolve: Meta-Evolution of Agent Memory Systems

Beginner
Guibin Zhang, Haotian Ren et al.Dec 21arXiv

MemEvolve teaches AI agents not only to remember past experiences but also to improve the way they remember, like a student who upgrades their study habits over time.

#LLM agents#agent memory#meta-evolution

Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing

Beginner
Shilong Zhang, He Zhang et al.Dec 19arXiv

This paper shows that great image understanding features alone are not enough for making great images; you also need strong pixel-level detail.

#Pixel–Semantic VAE#Semantic Regularization#Off-Manifold Generation

Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding

Beginner
Jiaqi Tang, Jianmin Chen et al.Dec 19arXiv

Robust-R1 teaches vision-language models to notice how a picture is damaged, think through what that damage hides, and then answer as if the picture were clear.

#Robust-R1#degradation-aware reasoning#multimodal large language models

Next-Embedding Prediction Makes Strong Vision Learners

Beginner
Sihan Xu, Ziqiao Ma et al.Dec 18arXiv

This paper introduces NEPA, a very simple way to teach vision models by having them predict the next patch’s embedding in an image sequence, just like language models predict the next word.

#self-supervised learning#vision transformer#autoregression

Trainable Log-linear Sparse Attention for Efficient Diffusion Transformers

Beginner
Yifan Zhou, Zeqi Xiao et al.Dec 18arXiv

This paper introduces Log-linear Sparse Attention (LLSA), a new way for Diffusion Transformers to focus only on the most useful information using a smart, layered search.

#Log-linear Sparse Attention#Hierarchical Top-K#Hierarchical KV Enrichment

Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs

Beginner
Sara Papi, Javier Garcia Gilabert et al.Dec 18arXiv

This paper builds a big, fair test called Hearing to Translate to check how well different speech translation systems work in the real world.

#speech translation#Speech-LLM#cascaded ASR-MT

LoPA: Scaling dLLM Inference via Lookahead Parallel Decoding

Beginner
Chenkai Xu, Yijie Jin et al.Dec 18arXiv

This paper speeds up diffusion language models (dLLMs) by changing the order in which they fill in missing words.

#Diffusion LLM#Parallel decoding#Token Filling Order

SCOPE: Prompt Evolution for Enhancing Agent Effectiveness

Beginner
Zehua Pei, Hui-Ling Zhen et al.Dec 17arXiv

SCOPE lets AI agents rewrite their own instructions while they are working, so they can fix mistakes and get smarter on the next step, not just the next task.

#prompt evolution#LLM agents#context management

HERBench: A Benchmark for Multi-Evidence Integration in Video Question Answering

Beginner
Dan Ben-Ami, Gabriele Serussi et al.Dec 16arXiv

HERBench is a new test that checks if video AI models can combine several clues spread across time, not just guess from one frame or language priors.

#Video Question Answering#Video-LLM#Multi-Evidence Integration

Sparse-LaViDa: Sparse Multimodal Discrete Diffusion Language Models

Beginner
Shufan Li, Jiuxiang Gu et al.Dec 16arXiv

Sparse-LaViDa makes diffusion-style AI models much faster by skipping unhelpful masked tokens during generation while keeping quality the same.

#Masked Discrete Diffusion#Sparse Parameterization#Register Tokens

Olmo 3

Beginner
Team Olmo, : et al.Dec 15arXiv

Olmo 3 is a family of fully-open AI language models (7B and 32B) where every step—from raw data to training code and checkpoints—is released.

#fully-open language models#model flow#long-context reasoning
7891011