LiveCodeBench
LiveCodeBench is a coding benchmark using competitive programming problems sourced from recent competitions, making it resistant to data contamination. It is harder and more reliable than HumanEval and is increasingly used as the primary coding benchmark by evaluation platforms.
Key facts
How LiveCodeBench works
LiveCodeBench presents competitive programming problems drawn from recent coding competitions that postdate model training cutoffs. This prevents models from having memorized solutions during training. Models must solve each problem and pass all test cases on the first attempt (pass@1). Problems range from straightforward algorithmic tasks to complex multi-step reasoning challenges.
What is a good LiveCodeBench score?
Scores range very widely from 17% to 91%. Frontier reasoning models score 80-91%, general frontier models score 45-80%, and smaller models score below 40%. The wide spread makes LiveCodeBench useful for comparing models at every tier, unlike HumanEval which only distinguishes at the lower end.
Why LiveCodeBench matters
LiveCodeBench addresses the main weakness of HumanEval: data contamination. Since problems come from recent competitions held after model training cutoffs, models cannot have memorized solutions. The very wide score range (17-91%) provides strong discrimination across all model tiers, from small open-weight models to frontier reasoning systems. It is increasingly replacing HumanEval as the primary coding benchmark.
How does LiveCodeBench compare to other benchmarks?
LiveCodeBench is harder and more contamination-resistant than HumanEval. While most frontier models score 90%+ on HumanEval (making it nearly useless for comparison), LiveCodeBench spreads them from 17% to 91%. Compared to SWE-bench, LiveCodeBench tests algorithmic problem-solving rather than real-world software engineering. Both are valuable but measure different skills.
Which AI model has the highest LiveCodeBench score?
Top 10 models by LiveCodeBench
Frequently asked questions
See all benchmark scores in the AI Frontier Model Tracker. Compare across all 8 benchmarks.
Get notified when we update the tracker
New model releases, benchmark updates, and pricing changes. No spam.