Putting the reading passage before the question and answer choices (CQO) makes language models much more accurate than putting it after (QOC), by about 15 percentage points on average.
Large language models (LLMs) are good at many math problems but often mess up simple counting when the list gets long.