MMLU-Pro
MMLU-Pro is a 10-choice graduate-level knowledge benchmark spanning STEM, law, medicine, history, and 57 academic subjects. It is the most widely recognized AI benchmark name, though frontier models now cluster between 83-90%, reducing its ability to distinguish top models.
Key facts
How MMLU-Pro works
MMLU-Pro presents multiple-choice questions at graduate level across 57 academic subjects. The Pro variant uses 10 answer choices instead of the original 4-5, reducing the benefit of random guessing and requiring stronger reasoning to eliminate incorrect options. Questions are sourced from professional exams, graduate coursework, and academic assessments.
What is a good MMLU-Pro score?
Frontier models score between 83% and 90% on MMLU-Pro. Scores above 85% are considered strong. Below 70% indicates a mid-tier or smaller model. The narrow 7-point spread at the top means small score differences (1-2 points) are often not meaningful and may reflect evaluation methodology rather than genuine capability differences.
Why MMLU-Pro matters
MMLU-Pro remains the most widely reported benchmark in AI model announcements. Despite near-saturation at the frontier, it is useful as a baseline measure of general knowledge and for comparing across model tiers (small vs large, general vs reasoning). Every major model launch includes an MMLU-Pro score, making it the common denominator for cross-provider comparison.
How does MMLU-Pro compare to other benchmarks?
MMLU-Pro tests broad knowledge recall across 57 subjects, while GPQA Diamond tests deep reasoning in science specifically. MMLU-Pro is easier and more saturated - top models cluster within 7 points (83-90%) - while GPQA Diamond spreads them across 15 points. For frontier model comparison, GPQA Diamond is more informative. MMLU-Pro remains useful as a baseline and for comparing smaller models.
Which AI model has the highest MMLU-Pro score?
Top 10 models by MMLU-Pro
Frequently asked questions
See all benchmark scores in the AI Frontier Model Tracker. Compare across all 8 benchmarks.
Get notified when we update the tracker
New model releases, benchmark updates, and pricing changes. No spam.