🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers1262

AllBeginnerIntermediateAdvanced
All SourcesarXiv

SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks

Intermediate
Xiangyi Li, Wenbo Chen et al.Feb 13arXiv

SkillsBench is a big test playground that measures whether giving AI agents step-by-step 'Skills' actually helps them finish real tasks.

#Agent Skills#LLM agents#Benchmarking

Not triaged yet

UniT: Unified Multimodal Chain-of-Thought Test-time Scaling

Intermediate
Leon Liangyu Chen, Haoyu Ma et al.Feb 12arXiv

UniT teaches one multimodal model to think in steps with pictures and words, so it can check its own work and fix mistakes as it goes.

#Unified multimodal model#Chain-of-thought#Test-time scaling

Not triaged yet

T3D: Few-Step Diffusion Language Models via Trajectory Self-Distillation with Direct Discriminative Optimization

Intermediate
Tunyu Zhang, Xinxi Zhang et al.Feb 12arXiv

This paper shows how to make diffusion language models write high‑quality text in just a few steps, which makes them much faster.

#diffusion language models#few-step decoding#trajectory self-distillation

Not triaged yet

DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing

Beginner
Dianyi Wang, Ruihang Li et al.Feb 12arXiv

DeepGen 1.0 is a small 5B-parameter model that can both make new images and smartly edit existing ones from text instructions.

#Unified multimodal model#Stacked Channel Bridging#Think tokens

Not triaged yet

ExStrucTiny: A Benchmark for Schema-Variable Structured Information Extraction from Document Images

Intermediate
Mathieu Sibue, Andres Muñoz Garza et al.Feb 12arXiv

ExStrucTiny is a new test (benchmark) that checks if AI can pull many connected facts from all kinds of documents and neatly put them into JSON, even when the question style and schema change.

#structured information extraction#document understanding#vision-language models

Not triaged yet

Query-focused and Memory-aware Reranker for Long Context Processing

Intermediate
Yuqing Li, Jiangnan Li et al.Feb 12arXiv

QRRanker is a lightweight way to sort many long text chunks by how helpful they are to a question, using the model’s own attention to score relevance.

#query-focused retrieval heads#attention-based reranking#listwise ranking

Not triaged yet

Sci-CoE: Co-evolving Scientific Reasoning LLMs via Geometric Consensus with Sparse Supervision

Intermediate
Xiaohan He, Shiyang Feng et al.Feb 12arXiv

Sci-CoE is a two-stage training method that helps one language model learn to both solve science problems and check those solutions with very little labeled data.

#scientific reasoning#co-evolution#solver-verifier

Not triaged yet

DreamID-Omni: Unified Framework for Controllable Human-Centric Audio-Video Generation

Intermediate
Xu Guo, Fulong Ye et al.Feb 12arXiv

DreamID-Omni is one model that can create, edit, and animate human-centered videos with matching voices, all in sync.

#audio-video generation#diffusion transformer#identity preservation

Not triaged yet

dVoting: Fast Voting for dLLMs

Intermediate
Sicheng Feng, Zigeng Chen et al.Feb 12arXiv

Diffusion Large Language Models (dLLMs) can write many parts of an answer at once, not just left to right like usual chatbots.

#diffusion large language models#remasking#test-time scaling

Not triaged yet

P-GenRM: Personalized Generative Reward Model with Test-time User-based Scaling

Intermediate
Pinyi Zhang, Ting-En Lin et al.Feb 12arXiv

This paper introduces P-GenRM, a personalized generative reward model that judges AI answers using a custom scorecard built just for each user and situation.

#personalized reward modeling#generative reward model#evaluation chain

Not triaged yet

The Pensieve Paradigm: Stateful Language Models Mastering Their Own Context

Intermediate
Xiaoyuan Liu, Tian Liang et al.Feb 12arXiv

This paper gives language models a 'wand' to manage their own memory, instead of relying on humans to stuff the prompt for them.

#Stateful language models#Pensieve paradigm#Context pruning

Not triaged yet

GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning

Intermediate
GigaBrain Team, Boyuan Wang et al.Feb 12arXiv

GigaBrain-0.5M* is a robot brain that sees, reads, and acts, and it gets smarter by imagining the future before moving.

#Vision-Language-Action#World Model#Reinforcement Learning

Not triaged yet

1819202122