ViDoRe V3 is a big, carefully built test that checks how well AI systems find and use information from both text and pictures (like tables and charts) in real documents.
MDAgent2 is a special helper built from large language models (LLMs) that can both answer questions about molecular dynamics and write runnable LAMMPS simulation code.
ModelTables is a giant, organized collection of tables that describe AI models, gathered from Hugging Face model cards, GitHub READMEs, and research papers.
HERBench is a new test that checks if video AI models can combine several clues spread across time, not just guess from one frame or language priors.
This paper argues that the fastest and safest path to super-smart AI is for humans and AIs to improve together, not for AI to improve alone.