🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers943

AllBeginnerIntermediateAdvanced
All SourcesarXiv

VersatileFFN: Achieving Parameter Efficiency in LLMs via Adaptive Wide-and-Deep Reuse

Intermediate
Ying Nie, Kai Han et al.Dec 16arXiv

Large language models get smarter when they get bigger, but storing all those extra weights eats tons of memory.

#VersatileFFN#parameter efficiency#virtual experts

RecGPT-V2 Technical Report

Intermediate
Chao Yi, Dian Chen et al.Dec 16arXiv

RecGPT‑V2 turns a recommender system into a smart team: a planner, several specialists, and a fair judge that all work together.

#Recommender systems#Large language models#Hierarchical multi‑agent system

A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoning

Intermediate
Zixin Zhang, Kanghao Chen et al.Dec 16arXiv

This paper builds A4-Agent, a smart three-part helper that figures out where to touch or use an object just from a picture and a written instruction, without any extra training.

#affordance prediction#zero-shot learning#vision-language models

RePo: Language Models with Context Re-Positioning

Intermediate
Huayang Li, Tianyu Zhao et al.Dec 16arXiv

Large language models usually line words up in fixed order slots, which can waste mental energy and make it harder to find the important parts of a long or noisy text.

#context re-positioning#positional encoding#self-attention

Vector Prism: Animating Vector Graphics by Stratifying Semantic Structure

Intermediate
Jooyeol Yun, Jaegul ChooDec 16arXiv

Vector Prism helps computers animate SVG images by first discovering which tiny shapes belong together as meaningful parts.

#SVG animation#semantic restructuring#vision–language models

SS4D: Native 4D Generative Model via Structured Spacetime Latents

Intermediate
Zhibing Li, Mengchen Zhang et al.Dec 16arXiv

SS4D is a new AI model that turns a short single-camera video into a full 3D object that moves over time (that’s 4D), and it does this in about 2 minutes.

#4D generation#structured spacetime latents#temporal attention

Zoom-Zero: Reinforced Coarse-to-Fine Video Understanding via Temporal Zoom-in

Intermediate
Xiaoqian Shen, Min-Hung Chen et al.Dec 16arXiv

Zoom-Zero helps AI answer questions about videos by first finding the right moment and then zooming in to double-check tiny details.

#Grounded Video Question Answering#Temporal Grounding#Coarse-to-Fine

Understanding and Improving Hyperbolic Deep Reinforcement Learning

Intermediate
Timo Klein, Thomas Lang et al.Dec 16arXiv

Reinforcement learning agents often see the world in straight, flat space (Euclidean), but many decision problems look more like branching trees that fit curved, hyperbolic space better.

#hyperbolic reinforcement learning#Hyperboloid#Poincaré Ball

SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations

Intermediate
Wentao Guo, Mayank Mishra et al.Dec 16arXiv

SonicMoE makes Mixture-of-Experts (MoE) models train faster and use less memory by redesigning how data is moved and computed on GPUs.

#Mixture of Experts#Grouped GEMM#Token Rounding

Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed

Intermediate
Yonggan Fu, Lexington Whalen et al.Dec 16arXiv

Autoregressive (AR) models write one word at a time, which is accurate but slow, especially when your computer or GPU can’t keep many tasks in memory at once.

#diffusion language models#autoregressive models#AR-to-dLM conversion

HyperVL: An Efficient and Dynamic Multimodal Large Language Model for Edge Devices

Intermediate
HyperAI Team, Yuchen Liu et al.Dec 16arXiv

HyperVL is a small but smart model that understands images and text, designed to run fast on phones and tablets.

#HyperVL#on-device multimodal#edge AI

OpenDataArena: A Fair and Open Arena for Benchmarking Post-Training Dataset Value

Intermediate
Mengzhang Cai, Xin Gao et al.Dec 16arXiv

OpenDataArena (ODA) is a fair, open platform that measures how valuable different post‑training datasets are for large language models by holding everything else constant.

#OpenDataArena#post-training datasets#data-centric AI
6465666768