🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers19

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#LLM-as-judge

A Benchmark and Agentic Framework for Omni-Modal Reasoning and Tool Use in Long Videos

Intermediate
Mohammed Irfan Kurpath, Jaseel Muhammad Kaithakkodan et al.Dec 18arXiv

This paper builds a new test, LongShOTBench, to check if AI can truly understand long videos by using sight, speech, and sounds together.

#long-form video understanding#multimodal reasoning#audio-visual-speech alignment

OpenDataArena: A Fair and Open Arena for Benchmarking Post-Training Dataset Value

Intermediate
Mengzhang Cai, Xin Gao et al.Dec 16arXiv

OpenDataArena (ODA) is a fair, open platform that measures how valuable different post‑training datasets are for large language models by holding everything else constant.

#OpenDataArena#post-training datasets#data-centric AI

Finch: Benchmarking Finance & Accounting across Spreadsheet-Centric Enterprise Workflows

Intermediate
Haoyu Dong, Pengkun Zhang et al.Dec 15arXiv

FINCH is a new test that checks whether AI can handle real finance and accounting work using messy, real spreadsheets, emails, PDFs, charts, and more.

#FINCH benchmark#finance and accounting AI#spreadsheet agents

VOYAGER: A Training Free Approach for Generating Diverse Datasets using LLMs

Intermediate
Avinash Amballa, Yashas Malur Saidutta et al.Dec 12arXiv

VOYAGER is a training-free way to make large language models (LLMs) create data that is truly different, not just slightly reworded.

#VOYAGER#determinantal point process#dataset diversity

Causal Judge Evaluation: Calibrated Surrogate Metrics for LLM Systems

Intermediate
Eddie Landesberg, Manjari NarayanDec 11arXiv

LLM judges are cheap but biased; without calibration they can completely flip which model looks best.

#LLM-as-judge#calibration#isotonic regression

MOA: Multi-Objective Alignment for Role-Playing Agents

Intermediate
Chonghua Liao, Ke Wang et al.Dec 10arXiv

Role-playing agents need to juggle several goals at once, like staying in character, following instructions, and using the right tone.

#multi-objective alignment#role-playing agents#reinforcement learning

EtCon: Edit-then-Consolidate for Reliable Knowledge Editing

Intermediate
Ruilin Li, Yibin Wang et al.Dec 4arXiv

Large language models forget or misuse new facts if you only poke their weights once; EtCon fixes this with a two-step plan.

#knowledge editing#EtCon#TPSFT
12