Papers3

#Benchmarking LLMs

T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning

Qinsi Wang, Hancheng Ye et al.Mar 4arXiv

This paper shows that teaching AI to first draw a simple map of a text (nodes and links) before answering questions makes it smarter and more reliable.

#Structure of Thought#Text-to-Structure#Intermediate Representation

Not triaged yet

Benchmarking Large Language Models for Knowledge Graph Validation

Beginner

Farzad Shami, Stefano Marchesin et al.Feb 11arXiv

Knowledge graphs are like giant fact maps, and keeping every fact correct is hard and important.

#Knowledge Graph Validation#Fact Checking#Large Language Models

Not triaged yet

EpiQAL: Benchmarking Large Language Models in Epidemiological Question Answering for Enhanced Alignment and Reasoning

Intermediate

Mingyang Wei, Dehai Min et al.Jan 6arXiv

EpiQAL is a new benchmark that tests how well AI models answer population-level disease questions using real research papers.

#Epidemiological reasoning#Question answering#Benchmarking LLMs

Not triaged yet