🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers943

AllBeginnerIntermediateAdvanced
All SourcesarXiv

ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding

Intermediate
Jia-Nan Li, Jian Guan et al.Dec 15arXiv

ReFusion is a new way for AI to write text faster by planning in chunks (called slots) and then filling each chunk carefully.

#ReFusion#masked diffusion model#parallel decoding

Memory in the Age of AI Agents

Intermediate
Yuyang Hu, Shichun Liu et al.Dec 15arXiv

This survey explains how AI agents remember things and organizes the whole topic into three clear parts: forms, functions, and dynamics.

#Agent memory#LLM memory#Retrieval-augmented generation

Janus: Disaggregating Attention and Experts for Scalable MoE Inference

Intermediate
Zhexiang Zhang, Ye Wang et al.Dec 15arXiv

Janus splits a Mixture-of-Experts (MoE) model into two parts—attention and experts—so each can use just the right amount of GPUs.

#Mixture-of-Experts inference#disaggregated serving#activation load balancing

Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model

Intermediate
Team Seedance, Heyi Chen et al.Dec 15arXiv

Seedance 1.5 pro is a single model that makes video and sound together at the same time, so lips, music, and actions match naturally.

#audio-visual generation#diffusion transformer#cross-modal synchronization

Scaling Laws for Code: Every Programming Language Matters

Intermediate
Jian Yang, Shawn Guo et al.Dec 15arXiv

Different programming languages scale differently when training code AI models, so treating them all the same wastes compute and lowers performance.

#multilingual code pre-training#scaling laws#language-specific scaling

RecTok: Reconstruction Distillation along Rectified Flow

Intermediate
Qingyu Shi, Size Wu et al.Dec 15arXiv

RecTok is a new visual tokenizer that teaches the whole training path of a diffusion model (the forward flow) to be smart about image meaning, not just the starting latent features.

#Rectified Flow#Flow Matching#Visual Tokenizer

Differentiable Evolutionary Reinforcement Learning

Intermediate
Sitao Cheng, Tianle Li et al.Dec 15arXiv

This paper introduces DERL, a two-level learning system that automatically builds better reward functions for reinforcement learning agents.

#Differentiable Evolutionary Reinforcement Learning#Meta-Optimizer#Meta-Reward

FIN-bench-v2: A Unified and Robust Benchmark Suite for Evaluating Finnish Large Language Models

Intermediate
Joona Kytöniemi, Jousia Piha et al.Dec 15arXiv

FIN-bench-v2 is a big, tidy set of Finnish tests that checks how good large language models are at many things like reading, logic, and world knowledge.

#Finnish language models#benchmark suite#HuggingFace Datasets

KlingAvatar 2.0 Technical Report

Intermediate
Kling Team, Jialu Chen et al.Dec 15arXiv

KlingAvatar 2.0 is a system that makes long, sharp, lifelike talking-person videos that follow audio, images, and text instructions all at once.

#audio-driven avatar#video diffusion#diffusion transformer

ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement

Intermediate
Zhihang Liu, Xiaoyi Bao et al.Dec 15arXiv

ShowTable is a new way for AI to turn a data table into a beautiful, accurate infographic using a think–make–check–fix loop.

#creative table visualization#multimodal large language model#diffusion model

Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans?

Intermediate
Jiaqi Wang, Weijia Wu et al.Dec 15arXiv

This paper builds a new test called Video Reality Test to see if AI-made ASMR videos can fool both people and AI video watchers (VLMs).

#ASMR#audio-visual coupling#AI-generated video detection

Toward Ambulatory Vision: Learning Visually-Grounded Active View Selection

Intermediate
Juil Koo, Daehyeon Choi et al.Dec 15arXiv

This paper teaches robots to move their camera to a better spot before answering a question about what they see.

#Active Perception#Embodied AI#Vision-Language Models
6667686970