🎓How I Study AIHISA
đź“–Read
📄Papers📰Blogs🎬Courses
đź’ˇLearn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers1262

AllBeginnerIntermediateAdvanced
All SourcesarXiv

Decoding as Optimisation on the Probability Simplex: From Top-K to Top-P (Nucleus) to Best-of-K Samplers

Intermediate
Xiaotong Ji, Rasul Tutunov et al.Feb 20arXiv

Decoding (how a language model picks the next word) isn’t a bag of tricks; it’s a clean optimisation problem over probabilities.

#decoding as optimisation#probability simplex#softmax sampling

Not triaged yet

HyTRec: A Hybrid Temporal-Aware Attention Architecture for Long Behavior Sequential Recommendation

Intermediate
Lei Xin, Yuhao Zheng et al.Feb 20arXiv

The paper proposes HyTRec, a recommender system that reads very long histories fast while still paying sharp attention to the latest clicks and purchases.

#Hybrid Attention#Linear Attention#Softmax Attention

Not triaged yet

VLANeXt: Recipes for Building Strong VLA Models

Intermediate
Xiao-Ming Wu, Bin Fan et al.Feb 20arXiv

This paper studies Vision–Language–Action (VLA) robots under one fair setup to find which design choices truly matter.

#Vision-Language-Action#robot manipulation#flow matching

Not triaged yet

EgoPush: Learning End-to-End Egocentric Multi-Object Rearrangement for Mobile Robots

Intermediate
Boyuan An, Zhexiong Wang et al.Feb 20arXiv

EgoPush teaches a small mobile robot to push multiple objects into patterns (like a cross or a line) using only what it sees from its own camera, without any global map.

#egocentric perception#non-prehensile manipulation#object-centric representation

Not triaged yet

VidEoMT: Your ViT is Secretly Also a Video Segmentation Model

Intermediate
Narges Norouzi, Idil Esen Zulfikar et al.Feb 19arXiv

VidEoMT shows that a single, well‑trained Vision Transformer (ViT) can segment and track objects in videos without extra tracking gadgets.

#Video Segmentation#Vision Transformer#Encoder-only

Not triaged yet

MolHIT: Advancing Molecular-Graph Generation with Hierarchical Discrete Diffusion Models

Intermediate
Hojung Jung, Rodrigo Hormazabal et al.Feb 19arXiv

MolHIT is a new AI that builds molecules as graphs, moving from broad chemical groups to exact atoms step by step.

#molecular graph generation#discrete diffusion#hierarchical diffusion

Not triaged yet

AI Gamestore: Scalable, Open-Ended Evaluation of Machine General Intelligence with Human Games

Intermediate
Lance Ying, Ryan Truong et al.Feb 19arXiv

The paper argues that the fairest way to check how generally smart an AI is, is to see how quickly and well it learns lots of different human-made games, just like a person with the same time and practice.

#general intelligence#evaluation benchmark#game-based testing

Not triaged yet

Computer-Using World Model

Intermediate
Yiming Guan, Rui Yu et al.Feb 19arXiv

The paper builds a Computer-Using World Model (CUWM) that lets an AI “imagine” what a desktop app (like Word/Excel/PowerPoint) will look like after a click or keystroke—before doing it for real.

#world model#GUI agent#desktop automation

Not triaged yet

2Mamba2Furious: Linear in Complexity, Competitive in Accuracy

Intermediate
Gabriel Mongaras, Eric C. LarsonFeb 19arXiv

The paper studies Mamba-2 (a fast, linear-time attention method) and pares it down to the pieces that truly boost accuracy.

#linear attention#Mamba-2#2Mamba

Not triaged yet

ArXiv-to-Model: A Practical Study of Scientific LM Training

Intermediate
Anuj GuptaFeb 19arXiv

This paper shows, step by step, how to train a 1.36-billion-parameter science-focused language model directly from raw arXiv LaTeX files using only 2 A100 GPUs.

#scientific language model#arXiv LaTeX#tokenization

Not triaged yet

Unified Latents (UL): How to train your latents

Intermediate
Jonathan Heek, Emiel Hoogeboom et al.Feb 19arXiv

Unified Latents (UL) is a way to learn the hidden code (latents) for images and videos by training three parts together: an encoder, a diffusion prior, and a diffusion decoder.

#Unified Latents#diffusion prior#diffusion decoder

Not triaged yet

FRAPPE: Infusing World Modeling into Generalist Policies via Multiple Future Representation Alignment

Intermediate
Han Zhao, Jingbo Wang et al.Feb 19arXiv

Robots learn better when they predict short, meaningful summaries of future images instead of drawing every pixel of the future scene.

#world modeling#vision-language-action (VLA)#diffusion policy

Not triaged yet

1314151617