🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers196

AllBeginnerIntermediateAdvanced
All SourcesarXiv

Image Generation with a Sphere Encoder

Beginner
Kaiyu Yue, Menglin Jia et al.Feb 16arXiv

The Sphere Encoder is a new way to make images fast by teaching an autoencoder to place all images evenly on a big imaginary sphere and then decode random spots on that sphere back into pictures.

#Sphere Encoder#Spherical Latent Space#RMS Normalization

A Trajectory-Based Safety Audit of Clawdbot (OpenClaw)

Beginner
Tianyu Chen, Dongrui Liu et al.Feb 16arXiv

This paper checks how safe a real, tool-using AI agent called Clawdbot (OpenClaw) is by watching every step it takes during tasks, not just the final answer.

#trajectory-centric safety#tool-using AI agents#prompt injection

LongCLI-Bench: A Preliminary Benchmark and Study for Long-horizon Agentic Programming in Command-Line Interfaces

Beginner
Yukang Feng, Jianwen Sun et al.Feb 15arXiv

LongCLI-Bench is a new test that checks how well AI coding agents can handle long, realistic software projects in the command line, not just tiny coding puzzles.

#LongCLI-Bench#agentic programming#command-line interface agents

TactAlign: Human-to-Robot Policy Transfer via Tactile Alignment

Beginner
Youngsun Wi, Jessica Yin et al.Feb 14arXiv

Robots learn faster and more flexibly when they can use human touch data, but humans and robots feel touch with very different sensors.

#tactile alignment#human-to-robot transfer#rectified flow

RynnBrain: Open Embodied Foundation Models

Beginner
Ronghao Dang, Jiayan Guo et al.Feb 13arXiv

RynnBrain is an open-source 'robot brain' that helps machines see, think, and plan in the real world across space and time.

#embodied intelligence#egocentric vision#spatiotemporal localization

DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing

Beginner
Dianyi Wang, Ruihang Li et al.Feb 12arXiv

DeepGen 1.0 is a small 5B-parameter model that can both make new images and smartly edit existing ones from text instructions.

#Unified multimodal model#Stacked Channel Bridging#Think tokens

Adapting Vision-Language Models for E-commerce Understanding at Scale

Beginner
Matteo Nulli, Vladimir Orshulevich et al.Feb 12arXiv

This paper shows a simple, repeatable way to teach general Vision-Language Models (VLMs) to understand e-commerce items much better without forgetting their general skills.

#Vision-Language Models#E-commerce adaptation#Attribute extraction

Thinking with Drafting: Optical Decompression via Logical Reconstruction

Beginner
Jingxuan Wei, Honghao He et al.Feb 12arXiv

The paper fixes a common problem in AI: models can read pictures and text well, but they often mess up the logic behind them.

#Thinking with Drafting#optical decompression#visual algebra

ThinkRouter: Efficient Reasoning via Routing Thinking between Latent and Discrete Spaces

Beginner
Xin Xu, Tong Yu et al.Feb 12arXiv

ThinkRouter teaches a model to switch how it “thinks” based on how sure it feels, so it stays accurate without talking forever.

#latent reasoning#discrete token space#confidence-aware routing

Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm

Beginner
Jinrui Zhang, Chaodong Xiao et al.Feb 12arXiv

Training big language models usually needs super-expensive, tightly connected GPU clusters, which most people do not have.

#decentralized LLM pretraining#mixture-of-experts (MoE)#sparse expert synchronization

Multimodal Fact-Level Attribution for Verifiable Reasoning

Beginner
David Wan, Han Wang et al.Feb 12arXiv

This paper builds a new test, called MURGAT, to check whether AI models can back up each small fact they say with the right part of a video, audio, or figure.

#multimodal grounding#fact-level attribution#atomic fact decomposition

Causal-JEPA: Learning World Models through Object-Level Latent Interventions

Beginner
Heejeong Nam, Quentin Le Lidec et al.Feb 11arXiv

This paper introduces Causal-JEPA (C-JEPA), a world model that learns by hiding entire objects in its memory and forcing itself to predict them from other objects.

#C-JEPA#object-centric world model#object-level masking
23456