Quick Answer
Last verified:
Medium confidence

Groq uses custom pricing as of May 2026 with 3 plans available. Contact Groq directly for a personalized quote. Plan: Free (free). Enterprise pricing is available on request. Pricing depends on your chosen tier, contract length, and negotiated discounts.

Use the interactive pricing calculator to estimate your exact cost based on team size and requirements.

  • Free tier: Yes

Groq offers 3 pricing tiers: Free, Developer, Enterprise. The Developer plan is production api usage.

Compared to other llm api providers software, Groq is positioned at the budget-friendly price point.

  • 4 documented hidden costs beyond list price

How much does Groq cost?

Groq uses custom pricing across 3 plans. Contact Groq directly for a personalized quote. Plans include Free (free), Developer (custom pricing), Enterprise (custom pricing).

Groq Pricing Overview

Groq uses custom pricing — contact their sales team for a quote. The Free plan is free and is best for prototyping and evaluation. The Developer plan requires contacting sales for a custom quote and is designed for production api usage. The Enterprise plan requires contacting sales for a custom quote and is designed for high-volume enterprise deployments.

There are at least 4 documented hidden costs beyond Groq's list price, including implementation, training, and add-on fees.

This pricing was last verified in May 6, 2026 from 2 independent sources.

Groq offers ultra-fast LLM inference powered by custom LPU hardware, with a free tier for getting started and pay-as-you-go Developer pricing starting at $0.05 per million input tokens for Llama 3.1 8B. Larger models like Llama 3.3 70B cost $0.59/$0.79 per million tokens in/out. Groq achieves speeds of 500–1,000+ tokens per second, making it one of the fastest inference providers available.

How Groq Pricing Compares

Compare Groq pricing against top alternatives in LLM API Providers.

All Groq Plans & Pricing

Plan Monthly Annual Best For
Free rate_limit: Limited requests per minute Free Free Prototyping and evaluation
Developer Contact Sales Contact Sales Production API usage
Enterprise Contact Sales Contact Sales High-volume enterprise deployments
View all features by plan

Free

  • Free API key
  • Pay-as-you-go access
  • All models available
  • Rate-limited

Developer

  • Llama 3.1 8B Instant at $0.05/$0.08 per M tokens (in/out)
  • Llama 4 Scout (17Bx16E) at $0.11/$0.34 per M tokens
  • GPT OSS 20B at $0.075/$0.30 per M tokens (cached input $0.0375)
  • GPT OSS 120B at $0.15/$0.60 per M tokens (cached input $0.075)
  • Qwen3 32B at $0.29/$0.59 per M tokens
  • Llama 3.3 70B Versatile at $0.59/$0.79 per M tokens
  • Kimi K2 at $1.00/$3.00 per M tokens (cached input $0.50)
  • Whisper Large v3 at $0.111/hour, Turbo at $0.04/hour
  • TTS: Canopy Labs Orpheus English $22/M chars, Arabic $40/M chars
  • Built-in tools: Web search $5-$8/1000 req, code execution $0.18/hour
  • Prompt caching: 50% off input tokens on cache hit
  • Batch API: 50% lower cost (24h to 7-day window)
  • Up to 1,000 tokens/second on LPU hardware

Enterprise

  • Dedicated support
  • Custom rate limits
  • Large-scale solutions
  • SLA guarantees
  • On-prem deployment options
  • Enterprise-only models: Minimax M2.5, Qwen3-VL 32B
  • Fine-tuned models available on request

Usage-Based Rates

Per-unit pricing for Groq API usage.

Developer

Model Input Output Cached Per
llama-3-1-8b-instant 128K ctx $0.050 $0.080 1M tokens
llama-4-scout-17bx16e 128K ctx $0.110 $0.340 1M tokens
gpt-oss-20b 128K ctx $0.075 $0.300 $0.037 1M tokens
gpt-oss-safeguard-20b $0.075 $0.300 1M tokens
gpt-oss-120b 128K ctx $0.150 $0.600 $0.075 1M tokens
qwen3-32b 131K ctx $0.290 $0.590 1M tokens
llama-3-3-70b-versatile 128K ctx $0.590 $0.790 1M tokens
kimi-k2-instruct-0905 $1.00 $3.00 $0.500 1M tokens
  • Up to 1,000 tokens/second on LPU hardware
  • Whisper Large v3 at $0.111/hour for audio transcription (billed at minimum 10s/request)
  • Whisper Large v3 Turbo at $0.04/hour for audio transcription
  • TTS: Canopy Labs Orpheus English $22 per M characters, Arabic Saudi $40 per M characters
  • Built-in tools: Basic web search $5/1000 requests, Advanced search $8/1000, Visit website $1/1000, Code execution $0.18/hour, Browser automation $0.08/hour
  • Prompt caching: 50% discount on cached input tokens for Kimi K2, GPT OSS 20B, and GPT OSS 120B (no extra fee for caching itself)
  • Batch API: 50% lower cost with 24-hour to 7-day processing window

Compare Groq vs Alternatives

Before committing to Groq, compare pricing with these 3 alternatives in the same category.

All Groq alternatives & migration guides

What Companies Actually Pay for Groq

Median per-1M-token pricing across 8 models
Input $0.130/1M
Output $0.465/1M
Flagship models in this provider's catalog
Model Input /1M Output /1M Blended /1M
groq_llama-3-3-instruct-70b $0.590 $0.790 $0.640
groq_qwen3-32b-instruct-reasoning $0.290 $0.590 $0.365
groq_llama-4-scout-instruct $0.110 $0.340 $0.168
groq_gpt-oss-120b-low $0.150 $0.600 $0.263
groq_llama-3-1-instruct-8b $0.050 $0.080 $0.058
Review scores
Top pricing complaints
Free tier rate limits (30 RPM / 1,000 RPD) make production workloads impractical without upgradingLimited model selection compared to GPU-based inference providersNo data privacy guarantees on free tierSpeed advantage diminishes for large models or long context conversations
Source: Artificial Analysis — medians aggregated from 8 models in this provider's catalog. Per-1M-token pricing reflects list rates.

Groq Year 1 Total Cost by Company Size

Real deployment costs including licenses, implementation, training, and admin — not just the sticker price.

Audio Transcription: 400 Hours via Whisper $16 Year 1 total
$0.04 × 400 hours
Total $16

Transcribing 400 hours of mono audio using Groq's Whisper Large v3 Turbo. Note: stereo or multi-channel audio multiplies cost by the number of channels.

Light Developer Usage: Llama 3.1 8B $0 Year 1 total
$0.50 input
$0.40 output
Total $0

Small personal or prototype project using the most affordable model — 10M input tokens and 5M output tokens per month.

Production App: Moderate Usage with Llama 3.3 70B $45 Year 1 total
$29.50 input
$15.80 output
Total $45

Production application requiring strong reasoning — 50M input tokens and 20M output tokens per month using Groq's most capable tracked model.

Reddit: 'Groq offers Whisper Large v3 Turbo at $0.04 per hour of audio transcribed. $0.04 × 400 = $16.00'

How Groq Pricing Compares

Software Starting Price Top Price
Groq Custom Custom
Amazon Bedrock $0.07/per million tokens $75/per million tokens
Anyscale $0.15/per million tokens $5/per million tokens
Baidu ERNIE API $0.1/per million tokens $10/per million tokens
Cerebras Inference API $0.1/per million tokens $6/per million tokens
Claude API $0.03/per million tokens $75/per million tokens

4 Groq Hidden Costs Beyond the List Price

Beyond the listed price, Groq has at least 4 documented hidden costs that can significantly increase total cost of ownership.

Watch for 4 hidden costs
  • Free Tier Rate Limits Block Production Use 5-15% of license costs
    medium 2 sources
    Reddit "400 TPS https://groq.com/pricing/ 30 RPM and 1000 RPD https://console.groq.com/docs/rate-limits 0 privacy."
    Reddit "using the LLM with adequately sized context quickly hits the limit. I'm using OpenRouter for running my chatbot, for this reason."
  • Limited Model Selection Requires Multi-Provider Strategy 5-10% of license costs
    low 3 sources
    Reddit "groq doesn't have such a high variety of models and doesn't host them as quickly as Together."
    Reddit "Groq doesnt support [finetunes] and many other providers move from a per-token to a per-hour pricing model"
    Reddit "they have only 4 useless models offered. In the interview he talks a lot about a software-first approach where their compiler runs any model without any hand optimization."
  • No Privacy SLA on Free Tier 10-25% of license costs
    high 1 source
    Reddit "400 TPS https://groq.com/pricing/ 30 RPM and 1000 RPD https://console.groq.com/docs/rate-limits 0 privacy."
  • Speed Advantage Narrows for Large Models and Long Contexts 5-15% of license costs
    medium 2 sources
    Reddit "For real tho, I don't see anyone really use Groq long term. Maybe fall into their fast marketing for few days then quit."
    Reddit "llma on groq only works good if query is simple and small it can't hold long conversations"
Tip

Ask your Groq sales rep about these costs upfront. Getting them in writing before signing can save you from surprise charges later.

Full hidden costs breakdown →

Intelligence sourced from 2 independent sources
Reddit User discussions Hacker News Tech community
Key claims include inline source attribution. Data verified against multiple independent sources. 15 source citations total.

Groq Contract Terms

Groq contracts do not auto-renew. Changes require advance notice. These terms are sourced from verified buyer experiences.

Contract Terms
Auto-Renewal No
Mid-Term Downgrade Allowed
Payment Terms Pay-as-you-go per token consumed; no subscription required
Price Escalation No published price escalation schedule; token prices have generally trended downward as model catalog expands
Note

Pay-per-token model with no subscription or minimum commitment; usage billing stops when not in use

Based on 1 verified source

How to Negotiate Groq Pricing

Groq contracts are negotiable. These 5 tactics are sourced from real buyer experiences and procurement specialists.

Negotiation Playbook 5 tactics
Prototype on Free Tier Before Committing high success

Use the Free tier to validate your specific use case and measure actual token consumption before negotiating Developer or Enterprise pricing. The 30 RPM / 1,000 RPD limits are sufficient for load testing at low scale and for benchmarking latency against your requirements.

Reddit community consensus across multiple threads
Select the Smallest Model That Meets Quality Bar high success

Groq's per-token pricing varies up to 12x between models (Llama 3.1 8B at $0.05/1M input vs. Llama 3.3 70B at $0.59/1M input per Artificial Analysis). Systematically benchmark smaller models against your task requirements before defaulting to larger ones — the cost difference is significant at scale.

Artificial Analysis (artificialanalysis.ai)
Use Speed Benchmarks as Enterprise Negotiation Leverage medium success

Groq's documented sub-millisecond latency (e.g., Qwen3 32B at 0.14ms, 627 tok/s per Artificial Analysis) is a genuine differentiator for real-time applications like voice agents, live transcription, and interactive coding tools. Quantify the latency value for your use case and use that business impact as justification for enterprise pricing discussions.

HN: 'Groq takes it further with models like Qwen3 32B at 0.14ms for $0.36/1M (627 tok/s)'
Negotiate Enterprise for Privacy and SLA Requirements medium success

The Free and Developer tiers have no documented privacy SLA. If your workload involves sensitive data or compliance requirements, make a DPA (Data Processing Agreement) a prerequisite of any Enterprise negotiation. This is standard for HIPAA and GDPR use cases and Groq will need to address it for enterprise sales.

Reddit: '0 privacy' comment on free tier limitations
Compare Against Competitors on Identical Models medium success

The same open-source model (e.g., Llama 3.3 70B) is available across multiple providers at different prices and speeds. Use Artificial Analysis or OpenRouter to document the price-performance tradeoff for your specific model before accepting Groq's Developer pricing. This data gives you credible alternatives to reference in Enterprise negotiations.

HN: 'huge markup on identical models... Groq takes it further with models like Qwen3 32B at 0.14ms for $0.36/1M'

Full negotiation guide →

Groq Pricing FAQ

01 How much does Groq API cost?

Groq API pricing is per-token and varies by model. The cheapest option is Llama 3.1 8B at $0.05 per million input tokens and $0.08 per million output tokens. Larger models like Llama 3.3 70B cost $0.59/$0.79 per million tokens. Groq offers a free API key with rate limits for getting started.

02 Does Groq have a free tier?

Yes, Groq offers a free API key with access to all models. The Free tier has rate limits on requests per minute. You can upgrade to the Developer plan for higher limits with pay-as-you-go token pricing.

03 Why is Groq so fast?

Groq uses custom LPU (Language Processing Unit) hardware designed specifically for AI inference, achieving speeds of 500–1,000+ tokens per second. This makes it one of the fastest LLM inference providers, particularly for real-time applications.

04 Groq vs OpenAI: which is cheaper for API usage?

Groq is typically cheaper for open-source models — Llama 3.1 8B costs $0.05/$0.08 per million tokens on Groq versus paying OpenAI rates for GPT models. Groq doesn't offer GPT models, so for OpenAI-specific models there's no direct comparison. Groq's main advantage is its speed alongside competitive pricing.

05 What models does Groq support?

Groq supports models including Llama 3.1 8B, Llama 4 Scout, Llama 3.3 70B, Qwen3 32B, and GPT OSS 20B, with pricing ranging from $0.05 to $0.59 per million input tokens. Enterprise plans support custom rate limits and SLA guarantees for high-volume deployments.

06 Is Groq free to use?

Yes. Groq's Free tier is $0/month and includes access to all available models through the API and playground. The free tier has rate limits of 30 requests per minute and 1,000 requests per day, which is sufficient for prototyping and low-volume personal projects. For production use, the Developer and Enterprise tiers offer custom pay-per-token pricing with higher limits.

07 How fast is Groq compared to other inference providers?

Groq is among the fastest inference providers available. According to Artificial Analysis data (April 2026), Groq achieves sub-millisecond latency on select models — for example, Qwen3 32B at 0.14ms with 627 tokens/sec throughput. Standard GPU-based providers typically deliver 40–80 tokens/sec on the same models. The speed advantage is most pronounced for short context, high-frequency requests.

08 What does Groq charge per million tokens?

Groq's per-token pricing varies by model. Based on Artificial Analysis data (April 2026), the provider median is $0.13/1M input tokens and $0.465/1M output tokens across 8 tracked models. The cheapest model is Llama 3.1 8B at $0.05/1M input and $0.08/1M output. The most capable tracked model, Llama 3.3 70B, costs $0.59/1M input and $0.79/1M output.

09 Does Groq support fine-tuned or custom models?

No. Groq's platform is optimized for a curated set of popular open-source models on its custom LPU hardware and does not support fine-tuned or custom model deployments. Teams requiring fine-tuned models must use GPU-based inference providers. Groq's limited model catalog compared to GPU providers is a frequently cited limitation.

10 How does Groq pricing compare to OpenAI for similar tasks?

Groq is significantly cheaper than OpenAI for equivalent open-source model quality. For example, Llama 3.3 70B on Groq costs $0.59/1M input tokens while GPT-4o costs approximately $5/1M — roughly 8x cheaper. For simple tasks, Groq's Llama 3.1 8B at $0.05/1M input is approximately 100x cheaper than GPT-4o.

11 Is Groq suitable for audio transcription?

Yes. Groq offers Whisper Large v3 Turbo for audio transcription at $0.04 per hour of audio, making 400 hours of transcription approximately $16. This is cost-competitive with other managed transcription services. Note that multi-channel audio (e.g., stereo recordings with separate speaker tracks) multiplies the cost by the number of channels.

12 What is Kimi K2 pricing on Groq?

Kimi K2 on Groq is priced at $1.00 per million input tokens and $3.00 per million output tokens, making it Groq's most expensive model. Cached input tokens cost $0.50/million (50% discount). Kimi K2 is a 1-trillion-parameter mixture-of-experts model from Moonshot AI, optimized for agentic and tool-use tasks. At $3.00/M output tokens, a 1M-token output session costs $3.00 — roughly $1.80 more than Llama 3.3 70B at $0.79/M.

Is this pricing incorrect? — we'll verify and update it.