🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers5

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#positional encoding

N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models

Intermediate
Yuxin Wang, Lei Ke et al.Dec 18arXiv

This paper teaches a vision-language model to first find objects in real 3D space (not just 2D pictures) and then reason about where things are.

#3D grounding#vision-language models#spatial reasoning

Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition

Intermediate
Shengming Yin, Zekai Zhang et al.Dec 17arXiv

The paper turns one flat picture into a neat stack of see‑through layers, so you can edit one thing without messing up the rest.

#image decomposition#RGBA layers#alpha blending

RePo: Language Models with Context Re-Positioning

Intermediate
Huayang Li, Tianyu Zhao et al.Dec 16arXiv

Large language models usually line words up in fixed order slots, which can waste mental energy and make it harder to find the important parts of a long or noisy text.

#context re-positioning#positional encoding#self-attention

LitePT: Lighter Yet Stronger Point Transformer

Intermediate
Yuanwen Yue, Damien Robert et al.Dec 15arXiv

LitePT is a new AI backbone for 3D point clouds that uses convolutions in early layers and attention in later layers to be both fast and accurate.

#LitePT#Point Transformer#3D point cloud

Group Representational Position Encoding

Intermediate
Yifan Zhang, Zixiang Chen et al.Dec 8arXiv

GRAPE is a new way to tell Transformers where each word is in a sentence by using neat math moves called group actions.

#GRAPE#positional encoding#group actions