Grok-4-Fast: xAI's Unified 2M-Token Model With Built-In Tool-Use Reinforcement Learning
'xAI released Grok-4-Fast, a single prompt-steerable model with a 2M-token window and tool-use RL that reduces token usage by about 40% while maintaining frontier benchmark performance.'
What Grok-4-Fast is
xAI has introduced Grok-4-Fast, a cost-optimized successor to Grok-4 designed to combine both 'reasoning' and 'non-reasoning' behaviors into a single model. Instead of switching between distinct models for short answers and long reasoning chains, Grok-4-Fast uses one weight space and lets system prompts steer its behavior. The model offers a 2 million token context window and native tool-use reinforcement learning that can choose when to browse the web, run code, or call external tools.
Unified architecture and prompt steering
Earlier Grok variants separated long-chain reasoning and quick, short-form responses into different models. Grok-4-Fast removes that split by unifying both behaviors in one set of weights. The benefit for real-time applications is twofold: it reduces latency caused by model switching and lowers token consumption, which translates directly into cost savings for high-throughput scenarios like search, interactive coding, and assistive agents. System prompts determine whether the model behaves in a 'reasoning' mode or a 'non-reasoning' mode, making behavior control simpler and faster.
Tool-use reinforcement learning and agent performance
Grok-4-Fast was trained end-to-end with tool-use reinforcement learning, teaching the model when to call tools such as web browsing, code execution, or other API endpoints. That training shows up in agent-oriented benchmarks: BrowseComp 44.9%, SimpleQA 95.0%, and Reka Research 66.0%. xAI also reports gains on Chinese variants, for example BrowseComp-zh 51.2%.
xAI further cites private battle-testing on LMArena: the search-focused grok-4-fast-search (codename 'menlo') ranked #1 in the Search Arena with 1163 Elo, while the text variant (codename 'tahoe') placed #8 in the Text Arena, roughly comparable to grok-4-0709.
Efficiency, benchmarks, and 'intelligence density'
On both internal and public benchmarks Grok-4-Fast posts frontier-class scores while cutting token usage. Reported pass@1 results include 92.0% on AIME 2025 (no tools), 93.3% on HMMT 2025 (no tools), 85.7% on GPQA Diamond, and 80.0% on LiveCodeBench (Jan–May). xAI claims that Grok-4-Fast uses about 40% fewer 'thinking' tokens on average compared to Grok-4, and frames this improvement as higher 'intelligence density'. When combined with new per-token pricing, xAI estimates roughly a 98% reduction in the price required to reach the same benchmark performance as Grok-4.
Deployment options and pricing
Grok-4-Fast is generally available in Grok's Fast and Auto modes across web and mobile. In Auto mode, the system will select Grok-4-Fast for difficult queries to improve latency without sacrificing quality. For the first time, free users gain access to xAI's newest model tier.
Developers can access two SKUs: grok-4-fast-reasoning and grok-4-fast-non-reasoning, both offering the 2M-token context window. xAI's API pricing is tiered by context length and token direction: $0.20 per 1M input tokens for contexts under 128k, $0.40 per 1M input tokens for contexts at or above 128k, $0.50 per 1M output tokens for outputs under 128k, $1.00 per 1M output tokens for outputs at or above 128k, and $0.05 per 1M cached input tokens.
Why this matters for search and agentic workflows
The combination of a very large context window, prompt-steerable unified behavior, and tool-use RL positions Grok-4-Fast for high-throughput search and agent tasks. Reductions in both latency and token usage make it attractive for production deployments where per-query cost and responsiveness are critical. Early public signals, like the LMArena placements, and the benchmark profile suggest xAI has achieved comparable accuracy to Grok-4 with materially lower per-query token consumption.
Technical highlights
- Unified model with a 2M-token context window across both SKUs.
- End-to-end training with tool-use reinforcement learning to decide when to call tools.
- Reported efficiency of roughly 40% fewer thinking tokens versus Grok-4.
- API pricing designed for scale with cached input token discounts.
Grok-4-Fast packages Grok-4-level capabilities into a single, prompt-steerable model optimized for search and agent workloads, aiming to lower latency and unit costs in production environments. For more details consult xAI's technical notes and developer documentation.
Сменить язык
Читать эту статью на русском