Fireworks AI Pricing 2026
Complete pricing guide with plans, hidden costs, and cost analysis
Fireworks AI pricing ranges from $0 to $11/per million tokens / hour.
Fireworks AI costs Free to $11 per per million tokens / hour as of May 2026, with 5 plans available. Pricing depends on your chosen tier, contract length, and negotiated discounts.
Use the interactive pricing calculator to estimate your exact cost based on team size and requirements.
- Free tier: No free tier available
Fireworks AI offers 5 pricing tiers: Serverless, On-Demand (H100/H200), On-Demand (B200), On-Demand (B300), Enterprise. The On-Demand (H100/H200) plan is consistent inference workloads.
Compared to other llm api providers software, Fireworks AI is positioned at the budget-friendly price point.
- 2 documented hidden costs beyond list price
How much does Fireworks AI cost?
Fireworks AI Pricing Overview
Fireworks AI has 5 pricing plans ranging from $0 to $11/per million tokens / hour. The Serverless plan requires contacting sales for a custom quote and is designed for variable-volume api usage. The On-Demand (H100/H200) plan requires contacting sales for a custom quote and is designed for consistent inference workloads. The On-Demand (B200) plan requires contacting sales for a custom quote and is designed for cutting-edge performance. The On-Demand (B300) plan requires contacting sales for a custom quote and is designed for largest models requiring maximum vram. The Enterprise plan requires contacting sales for a custom quote and is designed for large-scale enterprise deployments.
There are at least 2 documented hidden costs beyond Fireworks AI's list price, including implementation, training, and add-on fees.
This pricing was last verified in May 3, 2026 from 1 independent sources.
Fireworks AI is an LLM inference platform providing access to 16+ open-source models through a unified API across five pricing tiers: Serverless (pay-per-token), On-Demand dedicated GPU instances on A100, H100/H200, or B200 hardware, and an Enterprise tier for large-scale deployments — all custom-quoted. According to Artificial Analysis (April 2026), the platform's median blended rate across tracked models is $0.84 per 1M tokens ($0.53 input / $1.68 output). There is no published free tier.
How Fireworks AI Pricing Compares
Compare Fireworks AI pricing against top alternatives in LLM API Providers.
All Fireworks AI Plans & Pricing
| Plan | Monthly | Annual | Best For |
|---|---|---|---|
| Serverless | Contact Sales | Contact Sales | Variable-volume API usage |
| On-Demand (H100/H200) | Contact Sales | Contact Sales | Consistent inference workloads |
| On-Demand (B200) | Contact Sales | Contact Sales | Cutting-edge performance |
| On-Demand (B300) | Contact Sales | Contact Sales | Largest models requiring maximum VRAM |
| Enterprise | Contact Sales | Contact Sales | Large-scale enterprise deployments |
View all features by plan
Serverless
- $1 free credits to start
- Models <4B at $0.10/M tokens
- Models 4B-16B at $0.20/M tokens
- Models >16B at $0.90/M tokens
- MoE 0-56B at $0.50/M tokens
- MoE 56.1-176B at $1.20/M tokens
- DeepSeek V4 Pro at $1.74 input / $3.48 output
- Kimi K2.6 at $0.95 input / $4.00 output
- GLM-5 at $1.00 input / $3.20 output
- Cached input tokens at 50% price
- Batch inference at 50% discount
- Embeddings from $0.008/M tokens
On-Demand (H100/H200)
- H100 80GB at $6.00/hr (rising to $7.00/hr May 1, 2026)
- H200 141GB at $6.00/hr (rising to $7.00/hr May 1, 2026)
- Dedicated model hosting
- Custom fine-tuned models
- Pay per GPU second
On-Demand (B200)
- B200 180GB at $9.00/hr (rising to $10.00/hr May 1, 2026)
- Latest generation hardware
- Maximum throughput
- Pay per GPU second
On-Demand (B300)
- B300 288GB at $11.00/hr (rising to $12.00/hr May 1, 2026)
- Highest-memory GPU option
- Largest model hosting
- Pay per GPU second
Enterprise
- Volume discounts
- Dedicated support
- Custom SLAs
- Faster speeds and higher rate limits
Usage-Based Rates
Per-unit pricing for Fireworks AI API usage.
Serverless
| Model | Input | Output | Cached | Per |
|---|---|---|---|---|
| models-under-4b | $0.100 | $0.100 | — | 1M tokens |
| models-4b-16b | $0.200 | $0.200 | — | 1M tokens |
| models-over-16b | $0.900 | $0.900 | — | 1M tokens |
| moe-0-56b | $0.500 | $0.500 | — | 1M tokens |
| moe-56-176b | $1.20 | $1.20 | — | 1M tokens |
| deepseek-v4-pro | $1.74 | $3.48 | $0.145 | 1M tokens |
| deepseek-v3 | $0.560 | $1.68 | — | 1M tokens |
| kimi-k2-6 | $0.950 | $4.00 | $0.160 | 1M tokens |
| kimi-k2-6-priority | $1.50 | $6.00 | $0.220 | 1M tokens |
| kimi-k2-6-turbo | $2.00 | $8.00 | $0.300 | 1M tokens |
| kimi-k2-5 | $0.600 | $3.00 | $0.100 | 1M tokens |
| kimi-k2-5-turbo | $0.990 | $4.94 | $0.160 | 1M tokens |
| glm-4-7 | $0.600 | $2.20 | — | 1M tokens |
| glm-5 | $1.00 | $3.20 | $0.200 | 1M tokens |
| glm-5-1 | $1.40 | $4.40 | $0.260 | 1M tokens |
| qwen3-vl-30b-a3b | $0.150 | $0.600 | — | 1M tokens |
| gpt-oss-120b | $0.150 | $0.600 | — | 1M tokens |
| gpt-oss-20b | $0.070 | $0.300 | — | 1M tokens |
| minimax-2-5 | $0.300 | $1.20 | $0.030 | 1M tokens |
| minimax-2-7 | $0.300 | $1.20 | $0.060 | 1M tokens |
| Item | Dimension | Unit | Rate |
|---|---|---|---|
| embeddings-up-to-150m | embedding | 1M tokens | $0.00800 |
| embeddings-150m-350m | embedding | 1M tokens | $0.016 |
| qwen3-8b-embeddings | embedding | 1M tokens | $0.100 |
- Pricing by model parameter size tier for general open models
- Specific pricing for major models (DeepSeek, Kimi, GLM, MiniMax, GPT-OSS)
- Cached input tokens at 50% of input price unless specified
- Batch inference at 50% discount
- $1 in free credits on signup
On-Demand (H100/H200)
| Model | Unit | Rate |
|---|---|---|
| h100-80gb | hour | $6.00 |
| h200-141gb | hour | $6.00 |
- $6.00/hour per H100 80GB or H200 141GB GPU through Apr 30, 2026
- Rising to $7.00/hour from May 1, 2026
On-Demand (B200)
| Model | Unit | Rate |
|---|---|---|
| b200-180gb | hour | $9.00 |
- $9.00/hour per B200 180GB GPU through Apr 30, 2026
- Rising to $10.00/hour from May 1, 2026
On-Demand (B300)
| Model | Unit | Rate |
|---|---|---|
| b300-288gb | hour | $11.00 |
- $11.00/hour per B300 288GB GPU through Apr 30, 2026
- Rising to $12.00/hour from May 1, 2026
Compare Fireworks AI vs Alternatives
Before committing to Fireworks AI, compare pricing with these 3 alternatives in the same category.
What Companies Actually Pay for Fireworks AI
| Model | Input /1M | Output /1M | Blended /1M |
|---|---|---|---|
| fireworks_deepseek-v3-2 | $0.560 | $1.68 | $0.840 |
| fireworks_kimi-k2-6 | $0.950 | $4.00 | $1.71 |
| fireworks_llama-3-3-instruct-70b | $0.900 | $0.900 | $0.900 |
| fireworks_minimax-m2-7 | $0.300 | $1.20 | $0.525 |
| fireworks_qwen3-8b-instruct | $0.200 | $0.200 | $0.200 |
How Fireworks AI Pricing Compares
| Software | Starting Price | Top Price |
|---|---|---|
| Fireworks AI | Free | $11/per million tokens / hour |
| Amazon Bedrock | $0.07/per million tokens | $75/per million tokens |
| Anyscale | $0.15/per million tokens | $5/per million tokens |
| Baidu ERNIE API | $0.1/per million tokens | $10/per million tokens |
| Cerebras Inference API | $0.1/per million tokens | $6/per million tokens |
| Claude API | $0.03/per million tokens | $75/per million tokens |
Detailed pricing comparisons:
How to Negotiate Fireworks AI Pricing
Fireworks AI contracts are negotiable. These 4 tactics are sourced from real buyer experiences and procurement specialists.
For flagship models available directly from their creators (e.g., DeepSeek, Mistral, Meta), compare Fireworks AI Serverless rates against the direct provider API. Community reports from early 2025 showed Fireworks pricing 2–4x higher than direct for certain models. If your workload uses predominantly one model and volume is high, the cost delta may outweigh the convenience of Fireworks' unified API.
Reddit community (r/startups 2025-03-07, r/OpenAI 2025-01-28)Fireworks AI's Serverless tier charges per token, which can be costly at scale. For predictable, sustained inference workloads, On-Demand dedicated GPU instances (A100, H100/H200, or B200) may offer lower effective per-token costs. Contact Fireworks AI sales with your monthly token estimates to get a GPU-hour comparison.
Current tier dataFireworks AI's Enterprise tier is custom-quoted. Teams with large, predictable monthly token volumes should negotiate annual volume commitments in exchange for rate discounts and dedicated SLAs. Engage Fireworks sales with 3–6 months of usage data to support the negotiation.
Current tier dataFireworks AI offers three On-Demand GPU grades: A100, H100/H200, and B200. A100 instances are typically lowest cost. Unless your workload requires H100/H200 or B200 throughput, default to A100 to minimize GPU-hour spend and negotiate upgrades only when latency SLAs demand it.
Current tier dataFireworks AI Pricing FAQ
01 How much does Fireworks AI cost?
Fireworks AI serverless pricing starts at $0.10 per million tokens for small models (<4B parameters) and goes up to $0.90/M for models over 16B. On-demand GPU deployments range from $2.90/hr (A100) to $9.00/hr (B200). New accounts get $1 in free credits.
02 Does Fireworks AI have a free tier?
Fireworks AI offers $1 in free credits for new accounts. After that, pricing is pay-as-you-go with no minimum commitment. Batch inference and cached input tokens each offer 50% discounts, reducing ongoing costs.
03 How does Fireworks AI fine-tuning work?
Fireworks AI supports fine-tuning with SFT and DPO methods. Pricing ranges from $0.50/M training tokens for models under 16B to $10–20/M tokens for models over 300B. Fine-tuned models can be deployed on Serverless or dedicated infrastructure.
04 Fireworks AI vs Together AI: which should I choose?
Both offer serverless inference starting at $0.10/M tokens. Fireworks AI provides $1 free credits upfront and offers A100 On-Demand at $2.90/hr, while Together AI's comparable H100 dedicated is $3.99/hr. Fireworks AI is generally slightly cheaper for dedicated GPU hosting and offers batch discounts of 50%.
05 What is Fireworks AI On-Demand pricing?
Fireworks AI On-Demand GPU deployments are priced at $2.90/hr for A100 80GB, $6.00/hr for H100/H200, and $9.00/hr for B200. These are dedicated single-tenant deployments ideal for hosting custom fine-tuned models or maintaining consistent inference capacity.
06 Is Fireworks AI cheaper than going directly to model providers like DeepSeek?
Not always. Community comparisons from early 2025 noted DeepSeek R1 costing $8/1M output tokens on Fireworks Serverless versus $2.19/1M output tokens directly from DeepSeek. However, Fireworks pricing evolves frequently — the Artificial Analysis April 2026 benchmark shows the provider median at $1.68/1M output tokens across 16 tracked models. For high-volume single-model workloads, always compare current rates against direct provider APIs before committing.
07 What GPU options are available on Fireworks AI's On-Demand tier?
Fireworks AI offers three On-Demand GPU tiers: A100, H100/H200, and B200. All are custom-priced based on your requirements. The Enterprise tier adds dedicated infrastructure, SLA guarantees, and additional support. Contact Fireworks AI sales for specific GPU-hour pricing at your expected usage level.
08 Can I fine-tune models on Fireworks AI?
Fine-tuning has limitations. Community users have noted that MoE (Mixture of Experts) models over 176B parameters cannot be fine-tuned on the Serverless tier. Teams requiring fine-tuning of large MoE models need to use On-Demand or Enterprise tiers, which carry custom pricing.
09 What is the median cost per million tokens on Fireworks AI?
According to Artificial Analysis data from April 2026, Fireworks AI's median blended rate across 16 tracked models is $0.84 per 1M tokens, with a median input rate of $0.53/1M and median output rate of $1.68/1M. Individual model prices range from $0.20/1M blended (Qwen3-8B) to $2.15/1M blended (GLM-5-1).
Is this pricing incorrect? — we'll verify and update it.