This paper builds a safe science “playground” called DeR that fairly tests how AI finds facts (retrieval) and how it thinks with those facts (reasoning) without mixing them up.
This paper builds a big, fair test called Hearing to Translate to check how well different speech translation systems work in the real world.