🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers784

AllBeginnerIntermediateAdvanced
All SourcesarXiv

A2Eval: Agentic and Automated Evaluation for Embodied Brain

Intermediate
Shuai Zhang, Jiayu Hu et al.Feb 2arXiv

A2Eval is a two-agent system that automatically builds and runs fair tests for robot-style vision-language models, cutting wasted work while keeping results trustworthy.

#Embodied AI#Vision-Language Models#Agentic Evaluation

Research on World Models Is Not Merely Injecting World Knowledge into Specific Tasks

Intermediate
Bohan Zeng, Kaixin Zhu et al.Feb 2arXiv

This paper argues that true world models are not just sprinkling facts into single tasks, but building a unified system that can see, think, remember, act, and generate across many situations.

#world models#unified framework#multimodal reasoning

PISCES: Annotation-free Text-to-Video Post-Training via Optimal Transport-Aligned Rewards

Intermediate
Minh-Quan Le, Gaurav Mittal et al.Feb 2arXiv

This paper shows how to make text-to-video models create clearer, steadier, and more on-topic videos without using any human-labeled ratings.

#text-to-video#optimal transport#annotation-free

Generative Visual Code Mobile World Models

Intermediate
Woosung Koh, Sungjun Han et al.Feb 2arXiv

This paper shows a new way to predict what a phone screen will look like after you tap or scroll: generate web code (like HTML/CSS/SVG) and then render it to pixels.

#mobile GUI#world model#vision-language model

FS-Researcher: Test-Time Scaling for Long-Horizon Research Tasks with File-System-Based Agents

Intermediate
Chiwei Zhu, Benfeng Xu et al.Feb 2arXiv

FS-Researcher is a two-agent system that lets AI do very long research by saving everything in a computer folder so it never runs out of memory.

#FS-Researcher#file-system agents#external memory

Toward Cognitive Supersensing in Multimodal Large Language Model

Intermediate
Boyi Li, Yifan Shen et al.Feb 2arXiv

This paper teaches multimodal AI models to not just read pictures but to also imagine and think with pictures inside their heads.

#multimodal large language model#visual cognition#latent visual imagery

Alternating Reinforcement Learning for Rubric-Based Reward Modeling in Non-Verifiable LLM Post-Training

Intermediate
Ran Xu, Tianci Liu et al.Feb 2arXiv

The paper introduces Rubric-ARM, a system that teaches two AI helpers—a rubric maker and a judge—to learn together using reinforcement learning so they can better decide which answers people would prefer.

#Rubric-based reward modeling#LLM-as-a-judge#Alternating reinforcement learning

Ebisu: Benchmarking Large Language Models in Japanese Finance

Intermediate
Xueqing Peng, Ruoyu Xiang et al.Feb 1arXiv

EBISU is a new test that checks how well AI models understand Japanese finance, a language and domain where hints and special terms are common.

#EBISU#Japanese finance NLP#implicit commitment recognition

Rethinking Selective Knowledge Distillation

Intermediate
Almog Tavor, Itay Ebenspanger et al.Feb 1arXiv

The paper studies how to teach a smaller language model using a bigger one by only focusing on the most useful bits instead of everything.

#knowledge distillation#selective distillation#student entropy

PromptRL: Prompt Matters in RL for Flow-Based Image Generation

Intermediate
Fu-Yun Wang, Han Zhang et al.Feb 1arXiv

PromptRL teaches a language model to rewrite prompts while a flow-based image model learns to draw, and both are trained together using the same rewards.

#PromptRL#flow matching#reinforcement learning

Balancing Understanding and Generation in Discrete Diffusion Models

Intermediate
Yue Liu, Yuzhong Zhao et al.Feb 1arXiv

This paper introduces XDLM, a single model that blends two popular diffusion styles (masked and uniform) so it both understands and generates text and images well.

#XDLM#discrete diffusion#stationary noise kernel

Beyond Pixels: Visual Metaphor Transfer via Schema-Driven Agentic Reasoning

Intermediate
Yu Xu, Yuxin Zhang et al.Feb 1arXiv

This paper teaches AI to copy the hidden idea inside a picture (a visual metaphor) and reuse that idea on a brand‑new subject.

#visual metaphor#metaphor transfer#schema grammar
89101112