🎓How I Study AIHISA
đź“–Read
📄Papers📰Blogs🎬Courses
đź’ˇLearn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers200

AllBeginnerIntermediateAdvanced
All SourcesarXiv

DODO: Discrete OCR Diffusion Models

Beginner
Sean Man, Roy Ganz et al.Feb 18arXiv

OCR is like reading a page exactly as it is, and that strictness makes it perfect for fast, parallel generation.

#OCR#vision-language models#discrete diffusion

Not triaged yet

Learning Personalized Agents from Human Feedback

Beginner
Kaiqu Liang, Julia Kruk et al.Feb 18arXiv

AI helpers often don’t know new users’ tastes and can’t keep up when those tastes change.

#personalization#human feedback#pre-action clarification

Not triaged yet

"What Are You Doing?": Effects of Intermediate Feedback from Agentic LLM In-Car Assistants During Multi-Step Processing

Beginner
Johannes Kirmayr, Raphael Wennmacher et al.Feb 17arXiv

The study tested how an in-car AI helper should talk while it works on long, multi-step tasks.

#agentic AI#LLM assistants#intermediate feedback

Not triaged yet

Image Generation with a Sphere Encoder

Beginner
Kaiyu Yue, Menglin Jia et al.Feb 16arXiv

The Sphere Encoder is a new way to make images fast by teaching an autoencoder to place all images evenly on a big imaginary sphere and then decode random spots on that sphere back into pictures.

#Sphere Encoder#Spherical Latent Space#RMS Normalization

Not triaged yet

A Trajectory-Based Safety Audit of Clawdbot (OpenClaw)

Beginner
Tianyu Chen, Dongrui Liu et al.Feb 16arXiv

This paper checks how safe a real, tool-using AI agent called Clawdbot (OpenClaw) is by watching every step it takes during tasks, not just the final answer.

#trajectory-centric safety#tool-using AI agents#prompt injection

Not triaged yet

LongCLI-Bench: A Preliminary Benchmark and Study for Long-horizon Agentic Programming in Command-Line Interfaces

Beginner
Yukang Feng, Jianwen Sun et al.Feb 15arXiv

LongCLI-Bench is a new test that checks how well AI coding agents can handle long, realistic software projects in the command line, not just tiny coding puzzles.

#LongCLI-Bench#agentic programming#command-line interface agents

Not triaged yet

TactAlign: Human-to-Robot Policy Transfer via Tactile Alignment

Beginner
Youngsun Wi, Jessica Yin et al.Feb 14arXiv

Robots learn faster and more flexibly when they can use human touch data, but humans and robots feel touch with very different sensors.

#tactile alignment#human-to-robot transfer#rectified flow

Not triaged yet

RynnBrain: Open Embodied Foundation Models

Beginner
Ronghao Dang, Jiayan Guo et al.Feb 13arXiv

RynnBrain is an open-source 'robot brain' that helps machines see, think, and plan in the real world across space and time.

#embodied intelligence#egocentric vision#spatiotemporal localization

Not triaged yet

DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing

Beginner
Dianyi Wang, Ruihang Li et al.Feb 12arXiv

DeepGen 1.0 is a small 5B-parameter model that can both make new images and smartly edit existing ones from text instructions.

#Unified multimodal model#Stacked Channel Bridging#Think tokens

Not triaged yet

Adapting Vision-Language Models for E-commerce Understanding at Scale

Beginner
Matteo Nulli, Vladimir Orshulevich et al.Feb 12arXiv

This paper shows a simple, repeatable way to teach general Vision-Language Models (VLMs) to understand e-commerce items much better without forgetting their general skills.

#Vision-Language Models#E-commerce adaptation#Attribute extraction

Not triaged yet

Thinking with Drafting: Optical Decompression via Logical Reconstruction

Beginner
Jingxuan Wei, Honghao He et al.Feb 12arXiv

The paper fixes a common problem in AI: models can read pictures and text well, but they often mess up the logic behind them.

#Thinking with Drafting#optical decompression#visual algebra

Not triaged yet

ThinkRouter: Efficient Reasoning via Routing Thinking between Latent and Discrete Spaces

Beginner
Xin Xu, Tong Yu et al.Feb 12arXiv

ThinkRouter teaches a model to switch how it “thinks” based on how sure it feels, so it stays accurate without talking forever.

#latent reasoning#discrete token space#confidence-aware routing

Not triaged yet

23456