Math Core reasoning

AIME 2025

AIME (American Invitational Mathematics Exam) is a prestigious high school mathematics competition that serves as a qualifier for the USA Mathematical Olympiad. AIME problems require creative problem-solving and mathematical insight beyond standard curriculum, making it a strong test of genuine reasoning ability.

Key facts

How AIME 2025 works

AIME consists of 15 problems, each with an integer answer between 000 and 999. Problems require creative approaches and mathematical insight rather than routine calculation. Models are scored on the percentage of problems solved correctly. The 2025 exam is used as the standard reference, with some models also evaluated on AIME 2026.

What is a good AIME 2025 score?

Reasoning models score 83-100% on AIME. GPT-5.4 achieved a perfect 100%. General models typically score 7-35%. This is the widest gap of any benchmark between reasoning and general models - a 65+ point spread. A score above 90% indicates exceptional mathematical reasoning capability.

Why AIME 2025 matters

AIME problems test genuine creative mathematical reasoning, not pattern matching or memorization. The enormous gap between general models (7-35%) and reasoning models (83-100%) makes AIME the starkest discriminator between models with and without reasoning capability. A model that scores highly on AIME demonstrates the ability to approach novel problems creatively - a capability that transfers to other reasoning tasks.

How does AIME 2025 compare to other benchmarks?

AIME is harder than MATH and requires more creative insight. While MATH draws from multiple competition sources at various difficulty levels, AIME uses only the American Invitational Mathematics Exam - one of the hardest standardized math competitions. The score spread on AIME is even wider than MATH (7-100% vs 50-97%), making it an even stronger discriminator between reasoning and general models.

Which AI model has the highest AIME 2025 score?

Top 10 models by AIME 2025

Frequently asked questions

AIME (American Invitational Mathematics Exam) is a prestigious math competition used as a qualifier for the USA Mathematical Olympiad. AI models are evaluated on the percentage of 15 problems solved correctly. It tests creative mathematical reasoning beyond standard curriculum.

Reasoning models score 83-100%. General models score 7-35%. GPT-5.4 achieved a perfect 100%. A score above 90% indicates exceptional mathematical reasoning. The gap between reasoning and general models is the widest of any benchmark.

AIME problems require creative insight and novel approaches that cannot be pattern-matched from training data. High AIME scores demonstrate genuine reasoning ability that transfers to other tasks. The wide gap between model types makes it a clear discriminator.

As of April 2026, GPT-5.4 leads with a perfect score of 100.0%, followed by Claude Opus 4.6 at 99.8% and Gemini 3 Flash at 99.7%. These scores demonstrate the power of reasoning/thinking modes.

See all benchmark scores in the AI Frontier Model Tracker. Compare across all 8 benchmarks.

Get notified when we update the tracker

New model releases, benchmark updates, and pricing changes. No spam.

RSS