TSRBench is a giant test that checks if AI models can understand and reason about data that changes over time, like heartbeats, stock prices, and weather.
This paper is the first big map of how AI can fix real software problems, not just write short code snippets.