Groq Pricing 2026
Complete pricing guide with plans, hidden costs, and cost analysis
Groq uses custom pricing — contact their sales team for a quote.
Groq uses custom pricing as of May 2026 with 3 plans available. Contact Groq directly for a personalized quote. Plan: Free (free). Enterprise pricing is available on request. Pricing depends on your chosen tier, contract length, and negotiated discounts.
Use the interactive pricing calculator to estimate your exact cost based on team size and requirements.
- Free tier: Yes
Groq offers 3 pricing tiers: Free, Developer, Enterprise. The Developer plan is production api usage.
Compared to other llm api providers software, Groq is positioned at the budget-friendly price point.
- 4 documented hidden costs beyond list price
How much does Groq cost?
Groq Pricing Overview
Groq uses custom pricing — contact their sales team for a quote. The Free plan is free and is best for prototyping and evaluation. The Developer plan requires contacting sales for a custom quote and is designed for production api usage. The Enterprise plan requires contacting sales for a custom quote and is designed for high-volume enterprise deployments.
There are at least 4 documented hidden costs beyond Groq's list price, including implementation, training, and add-on fees.
This pricing was last verified in May 6, 2026 from 2 independent sources.
How Groq Pricing Compares
Compare Groq pricing against top alternatives in LLM API Providers.
All Groq Plans & Pricing
| Plan | Monthly | Annual | Best For |
|---|---|---|---|
| Free rate_limit: Limited requests per minute | Free | Free | Prototyping and evaluation |
| Developer | Contact Sales | Contact Sales | Production API usage |
| Enterprise | Contact Sales | Contact Sales | High-volume enterprise deployments |
View all features by plan
Free
- Free API key
- Pay-as-you-go access
- All models available
- Rate-limited
Developer
- Llama 3.1 8B Instant at $0.05/$0.08 per M tokens (in/out)
- Llama 4 Scout (17Bx16E) at $0.11/$0.34 per M tokens
- GPT OSS 20B at $0.075/$0.30 per M tokens (cached input $0.0375)
- GPT OSS 120B at $0.15/$0.60 per M tokens (cached input $0.075)
- Qwen3 32B at $0.29/$0.59 per M tokens
- Llama 3.3 70B Versatile at $0.59/$0.79 per M tokens
- Kimi K2 at $1.00/$3.00 per M tokens (cached input $0.50)
- Whisper Large v3 at $0.111/hour, Turbo at $0.04/hour
- TTS: Canopy Labs Orpheus English $22/M chars, Arabic $40/M chars
- Built-in tools: Web search $5-$8/1000 req, code execution $0.18/hour
- Prompt caching: 50% off input tokens on cache hit
- Batch API: 50% lower cost (24h to 7-day window)
- Up to 1,000 tokens/second on LPU hardware
Enterprise
- Dedicated support
- Custom rate limits
- Large-scale solutions
- SLA guarantees
- On-prem deployment options
- Enterprise-only models: Minimax M2.5, Qwen3-VL 32B
- Fine-tuned models available on request
Usage-Based Rates
Per-unit pricing for Groq API usage.
Developer
| Model | Input | Output | Cached | Per |
|---|---|---|---|---|
| llama-3-1-8b-instant 128K ctx | $0.050 | $0.080 | — | 1M tokens |
| llama-4-scout-17bx16e 128K ctx | $0.110 | $0.340 | — | 1M tokens |
| gpt-oss-20b 128K ctx | $0.075 | $0.300 | $0.037 | 1M tokens |
| gpt-oss-safeguard-20b | $0.075 | $0.300 | — | 1M tokens |
| gpt-oss-120b 128K ctx | $0.150 | $0.600 | $0.075 | 1M tokens |
| qwen3-32b 131K ctx | $0.290 | $0.590 | — | 1M tokens |
| llama-3-3-70b-versatile 128K ctx | $0.590 | $0.790 | — | 1M tokens |
| kimi-k2-instruct-0905 | $1.00 | $3.00 | $0.500 | 1M tokens |
- Up to 1,000 tokens/second on LPU hardware
- Whisper Large v3 at $0.111/hour for audio transcription (billed at minimum 10s/request)
- Whisper Large v3 Turbo at $0.04/hour for audio transcription
- TTS: Canopy Labs Orpheus English $22 per M characters, Arabic Saudi $40 per M characters
- Built-in tools: Basic web search $5/1000 requests, Advanced search $8/1000, Visit website $1/1000, Code execution $0.18/hour, Browser automation $0.08/hour
- Prompt caching: 50% discount on cached input tokens for Kimi K2, GPT OSS 20B, and GPT OSS 120B (no extra fee for caching itself)
- Batch API: 50% lower cost with 24-hour to 7-day processing window
Compare Groq vs Alternatives
Before committing to Groq, compare pricing with these 3 alternatives in the same category.
What Companies Actually Pay for Groq
| Model | Input /1M | Output /1M | Blended /1M |
|---|---|---|---|
| groq_llama-3-3-instruct-70b | $0.590 | $0.790 | $0.640 |
| groq_qwen3-32b-instruct-reasoning | $0.290 | $0.590 | $0.365 |
| groq_llama-4-scout-instruct | $0.110 | $0.340 | $0.168 |
| groq_gpt-oss-120b-low | $0.150 | $0.600 | $0.263 |
| groq_llama-3-1-instruct-8b | $0.050 | $0.080 | $0.058 |
Groq Year 1 Total Cost by Company Size
Real deployment costs including licenses, implementation, training, and admin — not just the sticker price.
Transcribing 400 hours of mono audio using Groq's Whisper Large v3 Turbo. Note: stereo or multi-channel audio multiplies cost by the number of channels.
Small personal or prototype project using the most affordable model — 10M input tokens and 5M output tokens per month.
Production application requiring strong reasoning — 50M input tokens and 20M output tokens per month using Groq's most capable tracked model.
Reddit: 'Groq offers Whisper Large v3 Turbo at $0.04 per hour of audio transcribed. $0.04 × 400 = $16.00'
How Groq Pricing Compares
| Software | Starting Price | Top Price |
|---|---|---|
| Groq | Custom | Custom |
| Amazon Bedrock | $0.07/per million tokens | $75/per million tokens |
| Anyscale | $0.15/per million tokens | $5/per million tokens |
| Baidu ERNIE API | $0.1/per million tokens | $10/per million tokens |
| Cerebras Inference API | $0.1/per million tokens | $6/per million tokens |
| Claude API | $0.03/per million tokens | $75/per million tokens |
Detailed pricing comparisons:
Groq Contract Terms
Groq contracts do not auto-renew. Changes require advance notice. These terms are sourced from verified buyer experiences.
Pay-per-token model with no subscription or minimum commitment; usage billing stops when not in use
How to Negotiate Groq Pricing
Groq contracts are negotiable. These 5 tactics are sourced from real buyer experiences and procurement specialists.
Use the Free tier to validate your specific use case and measure actual token consumption before negotiating Developer or Enterprise pricing. The 30 RPM / 1,000 RPD limits are sufficient for load testing at low scale and for benchmarking latency against your requirements.
Reddit community consensus across multiple threadsGroq's per-token pricing varies up to 12x between models (Llama 3.1 8B at $0.05/1M input vs. Llama 3.3 70B at $0.59/1M input per Artificial Analysis). Systematically benchmark smaller models against your task requirements before defaulting to larger ones — the cost difference is significant at scale.
Artificial Analysis (artificialanalysis.ai)Groq's documented sub-millisecond latency (e.g., Qwen3 32B at 0.14ms, 627 tok/s per Artificial Analysis) is a genuine differentiator for real-time applications like voice agents, live transcription, and interactive coding tools. Quantify the latency value for your use case and use that business impact as justification for enterprise pricing discussions.
HN: 'Groq takes it further with models like Qwen3 32B at 0.14ms for $0.36/1M (627 tok/s)'The Free and Developer tiers have no documented privacy SLA. If your workload involves sensitive data or compliance requirements, make a DPA (Data Processing Agreement) a prerequisite of any Enterprise negotiation. This is standard for HIPAA and GDPR use cases and Groq will need to address it for enterprise sales.
Reddit: '0 privacy' comment on free tier limitationsThe same open-source model (e.g., Llama 3.3 70B) is available across multiple providers at different prices and speeds. Use Artificial Analysis or OpenRouter to document the price-performance tradeoff for your specific model before accepting Groq's Developer pricing. This data gives you credible alternatives to reference in Enterprise negotiations.
HN: 'huge markup on identical models... Groq takes it further with models like Qwen3 32B at 0.14ms for $0.36/1M'Groq Pricing FAQ
01 How much does Groq API cost?
Groq API pricing is per-token and varies by model. The cheapest option is Llama 3.1 8B at $0.05 per million input tokens and $0.08 per million output tokens. Larger models like Llama 3.3 70B cost $0.59/$0.79 per million tokens. Groq offers a free API key with rate limits for getting started.
02 Does Groq have a free tier?
Yes, Groq offers a free API key with access to all models. The Free tier has rate limits on requests per minute. You can upgrade to the Developer plan for higher limits with pay-as-you-go token pricing.
03 Why is Groq so fast?
Groq uses custom LPU (Language Processing Unit) hardware designed specifically for AI inference, achieving speeds of 500–1,000+ tokens per second. This makes it one of the fastest LLM inference providers, particularly for real-time applications.
04 Groq vs OpenAI: which is cheaper for API usage?
Groq is typically cheaper for open-source models — Llama 3.1 8B costs $0.05/$0.08 per million tokens on Groq versus paying OpenAI rates for GPT models. Groq doesn't offer GPT models, so for OpenAI-specific models there's no direct comparison. Groq's main advantage is its speed alongside competitive pricing.
05 What models does Groq support?
Groq supports models including Llama 3.1 8B, Llama 4 Scout, Llama 3.3 70B, Qwen3 32B, and GPT OSS 20B, with pricing ranging from $0.05 to $0.59 per million input tokens. Enterprise plans support custom rate limits and SLA guarantees for high-volume deployments.
06 Is Groq free to use?
Yes. Groq's Free tier is $0/month and includes access to all available models through the API and playground. The free tier has rate limits of 30 requests per minute and 1,000 requests per day, which is sufficient for prototyping and low-volume personal projects. For production use, the Developer and Enterprise tiers offer custom pay-per-token pricing with higher limits.
07 How fast is Groq compared to other inference providers?
Groq is among the fastest inference providers available. According to Artificial Analysis data (April 2026), Groq achieves sub-millisecond latency on select models — for example, Qwen3 32B at 0.14ms with 627 tokens/sec throughput. Standard GPU-based providers typically deliver 40–80 tokens/sec on the same models. The speed advantage is most pronounced for short context, high-frequency requests.
08 What does Groq charge per million tokens?
Groq's per-token pricing varies by model. Based on Artificial Analysis data (April 2026), the provider median is $0.13/1M input tokens and $0.465/1M output tokens across 8 tracked models. The cheapest model is Llama 3.1 8B at $0.05/1M input and $0.08/1M output. The most capable tracked model, Llama 3.3 70B, costs $0.59/1M input and $0.79/1M output.
09 Does Groq support fine-tuned or custom models?
No. Groq's platform is optimized for a curated set of popular open-source models on its custom LPU hardware and does not support fine-tuned or custom model deployments. Teams requiring fine-tuned models must use GPU-based inference providers. Groq's limited model catalog compared to GPU providers is a frequently cited limitation.
10 How does Groq pricing compare to OpenAI for similar tasks?
Groq is significantly cheaper than OpenAI for equivalent open-source model quality. For example, Llama 3.3 70B on Groq costs $0.59/1M input tokens while GPT-4o costs approximately $5/1M — roughly 8x cheaper. For simple tasks, Groq's Llama 3.1 8B at $0.05/1M input is approximately 100x cheaper than GPT-4o.
11 Is Groq suitable for audio transcription?
Yes. Groq offers Whisper Large v3 Turbo for audio transcription at $0.04 per hour of audio, making 400 hours of transcription approximately $16. This is cost-competitive with other managed transcription services. Note that multi-channel audio (e.g., stereo recordings with separate speaker tracks) multiplies the cost by the number of channels.
12 What is Kimi K2 pricing on Groq?
Kimi K2 on Groq is priced at $1.00 per million input tokens and $3.00 per million output tokens, making it Groq's most expensive model. Cached input tokens cost $0.50/million (50% discount). Kimi K2 is a 1-trillion-parameter mixture-of-experts model from Moonshot AI, optimized for agentic and tool-use tasks. At $3.00/M output tokens, a 1M-token output session costs $3.00 — roughly $1.80 more than Llama 3.3 70B at $0.79/M.
Is this pricing incorrect? — we'll verify and update it.