๐ŸŽ“How I Study AIHISA
๐Ÿ“–Read
๐Ÿ“„Papers๐Ÿ“ฐBlogs๐ŸŽฌCourses
๐Ÿ’กLearn
๐Ÿ›ค๏ธPaths๐Ÿ“šTopics๐Ÿ’กConcepts๐ŸŽดShorts
๐ŸŽฏPractice
๐ŸงฉProblems๐ŸŽฏPrompts๐Ÿง Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers17

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#LLM-as-judge

Finch: Benchmarking Finance & Accounting across Spreadsheet-Centric Enterprise Workflows

Intermediate
Haoyu Dong, Pengkun Zhang et al.Dec 15arXiv

FINCH is a new test that checks whether AI can handle real finance and accounting work using messy, real spreadsheets, emails, PDFs, charts, and more.

#FINCH benchmark#finance and accounting AI#spreadsheet agents

VOYAGER: A Training Free Approach for Generating Diverse Datasets using LLMs

Intermediate
Avinash Amballa, Yashas Malur Saidutta et al.Dec 12arXiv

VOYAGER is a training-free way to make large language models (LLMs) create data that is truly different, not just slightly reworded.

#VOYAGER#determinantal point process#dataset diversity

Causal Judge Evaluation: Calibrated Surrogate Metrics for LLM Systems

Intermediate
Eddie Landesberg, Manjari NarayanDec 11arXiv

LLM judges are cheap but biased; without calibration they can completely flip which model looks best.

#LLM-as-judge#calibration#isotonic regression

MOA: Multi-Objective Alignment for Role-Playing Agents

Intermediate
Chonghua Liao, Ke Wang et al.Dec 10arXiv

Role-playing agents need to juggle several goals at once, like staying in character, following instructions, and using the right tone.

#multi-objective alignment#role-playing agents#reinforcement learning

EtCon: Edit-then-Consolidate for Reliable Knowledge Editing

Intermediate
Ruilin Li, Yibin Wang et al.Dec 4arXiv

Large language models forget or misuse new facts if you only poke their weights once; EtCon fixes this with a two-step plan.

#knowledge editing#EtCon#TPSFT
12