AI Frontier Model Tracker

AI Frontier Model Tracker - New Releases New frontier AI model releases tracked by DemandSphere. Benchmarks, pricing, and capabilities from OpenAI, Anthropic, Google, xAI, Meta, DeepSeek, and more. https://www.demandsphere.com/research/demandsphere-radar/ai-frontier-model-tracker/releases/ en-us Sun, 19 Apr 2026 02:31:39 +0000 CC BY-NC 4.0 DemandSphere, Inc. Qwen3.6 35B-A3B (Alibaba/Qwen) - Reasoning MoE - 3B active / 35B total, 256 experts. Runs on a laptop. 73.4% SWE-bench Verified. Natively multimodal. Extensible to 1M context. Apache 2.0. Context: 262K tokens. Open weights. https://qwen.ai/blog?id=qwen3.6-35b-a3b ds-tracker-qw36-35b-2026-04-15 Wed, 15 Apr 2026 00:00:00 +0000 Reasoning Open Alibaba/Qwen Muse Spark (Meta) - Reasoning First model from Meta Superintelligence Labs. 89.5% GPQA Diamond, 50.2% HLE (Contemplating mode). Natively multimodal with visual chain of thought and multi-agent orchestration. 262K context. Free at meta.ai. #1 on HealthBench Hard (42.8) and CharXiv Reasoning (86.4). Context: 262K tokens. https://ai.meta.com/blog/introducing-muse-spark-msl/ ds-tracker-muse-spark-2026-04-08 Wed, 08 Apr 2026 00:00:00 +0000 Reasoning Closed Meta Qwen3.6-Plus (Alibaba/Qwen) - Reasoning Proprietary flagship. 1M native context, 65K output tokens. 78.8% SWE-bench Verified. Always-on chain-of-thought. Native function calling and tool use. Context: 1000K tokens. Pricing: $0.29/M input, $1.65/M output. https://qwen.ai/blog?id=qwen3.6 ds-tracker-qw36-2026-04-02 Thu, 02 Apr 2026 00:00:00 +0000 Reasoning Closed Alibaba/Qwen Gemma 4 26B-A4B (Google) - General MoE - 3.8B active / 25.2B total. Near 31B performance at a fraction of the compute. Apache 2.0. Context: 256K tokens. Open weights. https://ai.google.dev/gemma/docs/core/model_card_4 ds-tracker-ge4-26-2026-04-02 Thu, 02 Apr 2026 00:00:00 +0000 General Open Google Gemma 4 31B (Google) - General Strongest open Gemma model. 85.2% MMLU-Pro, 84.3% GPQA Diamond. Hybrid attention with sliding window. Apache 2.0. Context: 256K tokens. Open weights. https://ai.google.dev/gemma/docs/core/model_card_4 ds-tracker-ge4-31-2026-04-02 Thu, 02 Apr 2026 00:00:00 +0000 General Open Google MiniMax M2.7 (MiniMax) - Reasoning MiniMax March 2026 flagship. Self-evolving agentic model. 56.2% SWE-Pro. 205K context with automatic caching. $0.30/$1.20 per 1M tokens. Context: 205K tokens. Pricing: $0.3/M input, $1.2/M output. https://openrouter.ai/minimax/minimax-m2.7 ds-tracker-mm27-2026-03-18 Wed, 18 Mar 2026 00:00:00 +0000 Reasoning Closed MiniMax GPT-5.4 Mini (OpenAI) - Reasoning Cost-efficient reasoning model. 400K context. Near GPT-5.4 Standard performance at 70% lower cost. Multimodal with computer use support. Context: 400K tokens. Pricing: $0.75/M input, $4.5/M output. https://openai.com/index/introducing-gpt-5-4-mini-and-nano/ ds-tracker-gpt-54m-2026-03-17 Tue, 17 Mar 2026 00:00:00 +0000 Reasoning Closed OpenAI Grok 4.20 (xAI) - Reasoning xAI current flagship as of March 31, 2026. 2M token context window. Four-agent collaborative multi-agent variant available. Reasoning on/off toggle. $2/$6 per 1M tokens. Context: 2000K tokens. Pricing: $2/M input, $6/M output. https://docs.x.ai/developers/models ds-tracker-grok420-2026-03-10 Tue, 10 Mar 2026 00:00:00 +0000 Reasoning Closed xAI GPT-5.4 (OpenAI) - Reasoning OpenAI March 2026 flagship. First model with native Computer Use API (75% OSWorld, above human baseline of 72.4%). 1M context in Codex/API. 33% fewer hallucinations vs GPT-5.2. 83% GDPval. GPT-5.2 retires June 5, 2026. Context: 1050K tokens. Pricing: $2.5/M input, $15/M output. https://openai.com/index/introducing-gpt-5-4/ ds-tracker-gpt-54-2026-03-05 Thu, 05 Mar 2026 00:00:00 +0000 Reasoning Closed OpenAI Gemini 3.1 Pro (Google) - Reasoning First model to break 1500 LMArena Elo. 94.3% GPQA Diamond. 41% HLE - highest published score. Deep Think mode. 1M context. $4/$18 per 1M tokens above 200K. Context: 1000K tokens. Pricing: $2/M input, $12/M output. https://ai.google.dev/gemini-api/docs/models ds-tracker-g3p-2026-02-19 Thu, 19 Feb 2026 00:00:00 +0000 Reasoning Closed Google Claude Sonnet 4.6 (Anthropic) - General Current production Sonnet. Frontier coding and agent performance. Top Arena-Code Elo for everyday use. Context: 1000K tokens. Pricing: $3/M input, $15/M output. https://docs.anthropic.com/en/docs/about-claude/models ds-tracker-cs46-2026-02-17 Tue, 17 Feb 2026 00:00:00 +0000 General Closed Anthropic Qwen3.5 397B-A17B (Alibaba/Qwen) - Reasoning MoE - 17B active / 397B total. 88.4% GPQA Diamond. 201 languages. Extensible to 1M context. Apache 2.0. Context: 262K tokens. Pricing: $0.6/M input, $3.6/M output. Open weights. https://qwen.ai/blog/qwen3.5 ds-tracker-qw35-2026-02-16 Mon, 16 Feb 2026 00:00:00 +0000 Reasoning Open Alibaba/Qwen MiniMax M2.5 (MiniMax) - Reasoning 229B MoE / 10B active. 80.2% SWE-bench. 0.6pts behind Claude Opus 4.6. Most-used open model on OpenRouter. Context: 205K tokens. Pricing: $0.3/M input, $2.4/M output. Open weights. https://www.minimax.io/news/minimax-m25 ds-tracker-mm25-2026-02-12 Thu, 12 Feb 2026 00:00:00 +0000 Reasoning Open MiniMax Claude Opus 4.6 (Anthropic) - Reasoning Arena Code Elo 1548. 80.8% SWE-bench. 89.11% MMLU-Pro (vals.ai). Leads for coding and nuanced writing. Context: 1000K tokens. Pricing: $5/M input, $25/M output. https://docs.anthropic.com/en/docs/about-claude/models ds-tracker-co46-2026-02-05 Thu, 05 Feb 2026 00:00:00 +0000 Reasoning Closed Anthropic Kimi K2.5 (Moonshot AI) - Reasoning Multimodal K2 with native vision, video. Agent Swarm: up to 100 parallel sub-agents. Modified MIT. Context: 262K tokens. Pricing: $0.6/M input, $2.5/M output. Open weights. https://kimi.ai ds-tracker-kk25-2026-01-27 Tue, 27 Jan 2026 00:00:00 +0000 Reasoning Open Moonshot AI Gemini 3 Flash (Google) - General Budget champion - 78% SWE-bench, 90.4% GPQA Diamond, 99.7% AIME 2025. Best value in Dec 2025 frontier. Context: 1000K tokens. Pricing: $0.5/M input, $3/M output. https://ai.google.dev/gemini-api/docs/models ds-tracker-g3f-2025-12-17 Wed, 17 Dec 2025 00:00:00 +0000 General Closed Google DeepSeek V3.2 (DeepSeek) - General 685B MoE / 37B active. DeepSeek Sparse Attention: 70% long-context cost reduction. MIT license. Context: 128K tokens. Pricing: $0.28/M input, $0.42/M output. Open weights. https://deepseek.com ds-tracker-dsv32-2025-12-01 Mon, 01 Dec 2025 00:00:00 +0000 General Open DeepSeek Claude Opus 4.5 (Anthropic) - Reasoning 80.9% SWE-bench (record Nov 2025). 89.5% MMLU-Pro. Long-horizon agentic tasks. Extended thinking mode. Context: 200K tokens. Pricing: $5/M input, $25/M output. https://docs.anthropic.com/en/docs/about-claude/models ds-tracker-co45-2025-11-24 Mon, 24 Nov 2025 00:00:00 +0000 Reasoning Closed Anthropic Grok 4.1 Fast (xAI) - Reasoning Grok 4 capabilities at dramatically lower cost ($0.20/$0.50). #2 LMArena Elo Dec 2025. Leads EQ-Bench for creative. Real-time X data. Context: 2000K tokens. Pricing: $0.2/M input, $0.5/M output. https://docs.x.ai/docs/models ds-tracker-grok41-2025-11-19 Wed, 19 Nov 2025 00:00:00 +0000 Reasoning Closed xAI Claude Haiku 4.5 (Anthropic) - General Fastest efficient Anthropic model. First Haiku with extended thinking and computer use. 73.3% SWE-bench. Context: 200K tokens. Pricing: $1/M input, $5/M output. https://docs.anthropic.com/en/docs/about-claude/models ds-tracker-ch45-2025-10-15 Wed, 15 Oct 2025 00:00:00 +0000 General Closed Anthropic