🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers1055

AllBeginnerIntermediateAdvanced
All SourcesarXiv

LMEB: Long-horizon Memory Embedding Benchmark

Intermediate
Xinping Zhao, Xinshuo Hu et al.Mar 13arXiv

LMEB is a new test that checks whether text-embedding models can remember and find information across long stretches of time, not just in short, neat passages.

#LMEB#long-horizon memory retrieval#memory embeddings

RoboPocket: Improve Robot Policies Instantly with Your Phone

Intermediate
Junjie Fang, Wendi Chen et al.Mar 5arXiv

RoboPocket turns an ordinary smartphone into a pocket robot coach that helps you fix robot mistakes instantly—without touching a robot.

#RoboPocket#Imitation Learning#Interactive Imitation Learning

RealWonder: Real-Time Physical Action-Conditioned Video Generation

Intermediate
Wei Liu, Ziyu Chen et al.Mar 5arXiv

RealWonder is a system that turns a single picture and 3D physical actions (like pushes, wind, and robot gripper moves) into a realistic video in real time.

#action-conditioned video generation#physics simulation#optical flow

UltraDexGrasp: Learning Universal Dexterous Grasping for Bimanual Robots with Synthetic Data

Intermediate
Sizhe Yang, Yiman Xie et al.Mar 5arXiv

Robots need many different ways to grab things, just like people use pinch, tripod, whole-hand, or two hands together.

#bimanual dexterous grasping#universal grasp policy#synthetic data generation

Locality-Attending Vision Transformer

Intermediate
Sina Hajimiri, Farzad Beizaee et al.Mar 5arXiv

Vision Transformers (ViTs) are great at recognizing what is in a whole image but often blur the tiny details needed to label each pixel (segmentation).

#Vision Transformer#self-attention#segmentation

MASQuant: Modality-Aware Smoothing Quantization for Multimodal Large Language Models

Intermediate
Lulu Hu, Wenhu Xiao et al.Mar 5arXiv

Multimodal AI models handle text, images, and audio, but their signals are very different in size, which breaks standard low‑bit compression methods.

#post‑training quantization#multimodal LLM#channel‑wise smoothing

Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling

Intermediate
Yong Liu, Xingjian Su et al.Mar 5arXiv

Timer-S1 is a huge time-series model (8.3B parameters, only 0.75B used per step) that predicts the future by thinking step-by-step inside one forward pass.

#time series forecasting#foundation models#Mixture-of-Experts

DARE: Aligning LLM Agents with the R Statistical Ecosystem via Distribution-Aware Retrieval

Intermediate
Maojun Sun, Yue Wu et al.Mar 5arXiv

DARE is a new way for AI assistants to find the right R functions by also looking at what the data looks like, not just the words in the question.

#distribution-aware retrieval#RPKB#RCodingAgent

Helios: Real Real-Time Long Video Generation Model

Intermediate
Shenghai Yuan, Yuanyang Yin et al.Mar 4arXiv

Helios is a 14-billion-parameter video model that can make minute-long videos in real time at about 19.5 frames per second on a single NVIDIA H100 GPU.

#real-time video generation#long video diffusion#autoregressive diffusion

$V_1$: Unifying Generation and Self-Verification for Parallel Reasoners

Intermediate
Harman Singh, Xiuyu Li et al.Mar 4arXiv

The paper shows that when a model compares two of its own answers head-to-head, it picks the right one more often than when it judges each answer alone.

#pairwise self-verification#test-time scaling#parallel reasoning

CubeComposer: Spatio-Temporal Autoregressive 4K 360° Video Generation from Perspective Video

Intermediate
Lingen Li, Guangzhi Wang et al.Mar 4arXiv

CubeComposer is a new AI method that turns a normal forward-facing video into a full 360° VR video at true 4K quality without using super-resolution upscaling.

#360° video generation#cubemap#spatio-temporal autoregression

RIVER: A Real-Time Interaction Benchmark for Video LLMs

Intermediate
Yansong Shi, Qingsong Zhao et al.Mar 4arXiv

RIVER Bench is a new test that checks how well AI can watch a video stream and talk with you in real time.

#RIVER Bench#online video understanding#multimodal large language models
12345