Big models are often used to grade AI answers, but they are expensive, slow, and depend too much on tricky prompts.
Transformers are powerful but slow because regular self-attention compares every token with every other token, which grows too fast for long sequences.