Quick Answer
Last verified:
Estimate

Fireworks AI costs Free to $11 per per million tokens / hour as of May 2026, with 5 plans available. Pricing depends on your chosen tier, contract length, and negotiated discounts.

Use the interactive pricing calculator to estimate your exact cost based on team size and requirements.

  • Free tier: No free tier available

Fireworks AI offers 5 pricing tiers: Serverless, On-Demand (H100/H200), On-Demand (B200), On-Demand (B300), Enterprise. The On-Demand (H100/H200) plan is consistent inference workloads.

Compared to other llm api providers software, Fireworks AI is positioned at the budget-friendly price point.

  • 2 documented hidden costs beyond list price

How much does Fireworks AI cost?

Fireworks AI pricing starts at $0/per million tokens / hour across 5 plans, with enterprise pricing available on request. Plans include Serverless (custom pricing), On-Demand (H100/H200) (custom pricing), On-Demand (B200) (custom pricing), On-Demand (B300) (custom pricing), Enterprise (custom pricing).

Fireworks AI Pricing Overview

Fireworks AI has 5 pricing plans ranging from $0 to $11/per million tokens / hour. The Serverless plan requires contacting sales for a custom quote and is designed for variable-volume api usage. The On-Demand (H100/H200) plan requires contacting sales for a custom quote and is designed for consistent inference workloads. The On-Demand (B200) plan requires contacting sales for a custom quote and is designed for cutting-edge performance. The On-Demand (B300) plan requires contacting sales for a custom quote and is designed for largest models requiring maximum vram. The Enterprise plan requires contacting sales for a custom quote and is designed for large-scale enterprise deployments.

There are at least 2 documented hidden costs beyond Fireworks AI's list price, including implementation, training, and add-on fees.

This pricing was last verified in May 3, 2026 from 1 independent sources.

Fireworks AI is an LLM inference platform providing access to 16+ open-source models through a unified API across five pricing tiers: Serverless (pay-per-token), On-Demand dedicated GPU instances on A100, H100/H200, or B200 hardware, and an Enterprise tier for large-scale deployments — all custom-quoted. According to Artificial Analysis (April 2026), the platform's median blended rate across tracked models is $0.84 per 1M tokens ($0.53 input / $1.68 output). There is no published free tier.

How Fireworks AI Pricing Compares

Compare Fireworks AI pricing against top alternatives in LLM API Providers.

All Fireworks AI Plans & Pricing

Plan Monthly Annual Best For
Serverless Contact Sales Contact Sales Variable-volume API usage
On-Demand (H100/H200) Contact Sales Contact Sales Consistent inference workloads
On-Demand (B200) Contact Sales Contact Sales Cutting-edge performance
On-Demand (B300) Contact Sales Contact Sales Largest models requiring maximum VRAM
Enterprise Contact Sales Contact Sales Large-scale enterprise deployments
View all features by plan

Serverless

  • $1 free credits to start
  • Models <4B at $0.10/M tokens
  • Models 4B-16B at $0.20/M tokens
  • Models >16B at $0.90/M tokens
  • MoE 0-56B at $0.50/M tokens
  • MoE 56.1-176B at $1.20/M tokens
  • DeepSeek V4 Pro at $1.74 input / $3.48 output
  • Kimi K2.6 at $0.95 input / $4.00 output
  • GLM-5 at $1.00 input / $3.20 output
  • Cached input tokens at 50% price
  • Batch inference at 50% discount
  • Embeddings from $0.008/M tokens

On-Demand (H100/H200)

  • H100 80GB at $6.00/hr (rising to $7.00/hr May 1, 2026)
  • H200 141GB at $6.00/hr (rising to $7.00/hr May 1, 2026)
  • Dedicated model hosting
  • Custom fine-tuned models
  • Pay per GPU second

On-Demand (B200)

  • B200 180GB at $9.00/hr (rising to $10.00/hr May 1, 2026)
  • Latest generation hardware
  • Maximum throughput
  • Pay per GPU second

On-Demand (B300)

  • B300 288GB at $11.00/hr (rising to $12.00/hr May 1, 2026)
  • Highest-memory GPU option
  • Largest model hosting
  • Pay per GPU second

Enterprise

  • Volume discounts
  • Dedicated support
  • Custom SLAs
  • Faster speeds and higher rate limits

Usage-Based Rates

Per-unit pricing for Fireworks AI API usage.

Serverless

Model Input Output Cached Per
models-under-4b $0.100 $0.100 1M tokens
models-4b-16b $0.200 $0.200 1M tokens
models-over-16b $0.900 $0.900 1M tokens
moe-0-56b $0.500 $0.500 1M tokens
moe-56-176b $1.20 $1.20 1M tokens
deepseek-v4-pro $1.74 $3.48 $0.145 1M tokens
deepseek-v3 $0.560 $1.68 1M tokens
kimi-k2-6 $0.950 $4.00 $0.160 1M tokens
kimi-k2-6-priority $1.50 $6.00 $0.220 1M tokens
kimi-k2-6-turbo $2.00 $8.00 $0.300 1M tokens
kimi-k2-5 $0.600 $3.00 $0.100 1M tokens
kimi-k2-5-turbo $0.990 $4.94 $0.160 1M tokens
glm-4-7 $0.600 $2.20 1M tokens
glm-5 $1.00 $3.20 $0.200 1M tokens
glm-5-1 $1.40 $4.40 $0.260 1M tokens
qwen3-vl-30b-a3b $0.150 $0.600 1M tokens
gpt-oss-120b $0.150 $0.600 1M tokens
gpt-oss-20b $0.070 $0.300 1M tokens
minimax-2-5 $0.300 $1.20 $0.030 1M tokens
minimax-2-7 $0.300 $1.20 $0.060 1M tokens
Item Dimension Unit Rate
embeddings-up-to-150m embedding 1M tokens $0.00800
embeddings-150m-350m embedding 1M tokens $0.016
qwen3-8b-embeddings embedding 1M tokens $0.100
  • Pricing by model parameter size tier for general open models
  • Specific pricing for major models (DeepSeek, Kimi, GLM, MiniMax, GPT-OSS)
  • Cached input tokens at 50% of input price unless specified
  • Batch inference at 50% discount
  • $1 in free credits on signup

On-Demand (H100/H200)

Model Unit Rate
h100-80gb hour $6.00
h200-141gb hour $6.00
  • $6.00/hour per H100 80GB or H200 141GB GPU through Apr 30, 2026
  • Rising to $7.00/hour from May 1, 2026

On-Demand (B200)

Model Unit Rate
b200-180gb hour $9.00
  • $9.00/hour per B200 180GB GPU through Apr 30, 2026
  • Rising to $10.00/hour from May 1, 2026

On-Demand (B300)

Model Unit Rate
b300-288gb hour $11.00
  • $11.00/hour per B300 288GB GPU through Apr 30, 2026
  • Rising to $12.00/hour from May 1, 2026

Compare Fireworks AI vs Alternatives

Before committing to Fireworks AI, compare pricing with these 3 alternatives in the same category.

All Fireworks AI alternatives & migration guides

What Companies Actually Pay for Fireworks AI

Median per-1M-token pricing across 16 models
Input $0.530/1M
Output $1.68/1M
Flagship models in this provider's catalog
Model Input /1M Output /1M Blended /1M
fireworks_deepseek-v3-2 $0.560 $1.68 $0.840
fireworks_kimi-k2-6 $0.950 $4.00 $1.71
fireworks_llama-3-3-instruct-70b $0.900 $0.900 $0.900
fireworks_minimax-m2-7 $0.300 $1.20 $0.525
fireworks_qwen3-8b-instruct $0.200 $0.200 $0.200
Review scores
Top pricing complaints
Serverless pricing has historically been higher than going directly to underlying model providers for single-model workloadsCannot fine-tune large MoE models (over 176B parameters) on the Serverless tier
Source: Artificial Analysis — medians aggregated from 16 models in this provider's catalog. Per-1M-token pricing reflects list rates.

How Fireworks AI Pricing Compares

Software Starting Price Top Price
Fireworks AI Free $11/per million tokens / hour
Amazon Bedrock $0.07/per million tokens $75/per million tokens
Anyscale $0.15/per million tokens $5/per million tokens
Baidu ERNIE API $0.1/per million tokens $10/per million tokens
Cerebras Inference API $0.1/per million tokens $6/per million tokens
Claude API $0.03/per million tokens $75/per million tokens

2 Fireworks AI Hidden Costs Beyond the List Price

Beyond the listed price, Fireworks AI has at least 2 documented hidden costs that can significantly increase total cost of ownership.

Watch for 2 hidden costs
  • Markup Over Direct Provider APIs 100-300% of license costs
    medium 2 sources
    Reddit "I just checked the pricing, it's much more expensive then just using R1's API ($2.19/million tokens output directly from deepseek vs."
    Reddit "fireworks API Seems even cheaper if you go directly from DeepSeek https://api-docs.deepseek.com/quick_start/pricing deepseek-reasoner 1M TOKENS OUTPUT PRICE $2.19"
  • Fine-Tuning Unavailable for Large MoE Models on Serverless 5-15% of license costs
    medium 1 source
    Reddit "I just checked the pricing, it's much more expensive then just using R1's API ($2.19/million tokens output directly from deepseek vs."
Tip

Ask your Fireworks AI sales rep about these costs upfront. Getting them in writing before signing can save you from surprise charges later.

Full hidden costs breakdown →

Intelligence sourced from 1 independent sources
Reddit User discussions
Key claims include inline source attribution. Data verified against multiple independent sources. 8 source citations total.

How to Negotiate Fireworks AI Pricing

Fireworks AI contracts are negotiable. These 4 tactics are sourced from real buyer experiences and procurement specialists.

Negotiation Playbook 4 tactics
Benchmark Against Direct Provider APIs Before Committing high success

For flagship models available directly from their creators (e.g., DeepSeek, Mistral, Meta), compare Fireworks AI Serverless rates against the direct provider API. Community reports from early 2025 showed Fireworks pricing 2–4x higher than direct for certain models. If your workload uses predominantly one model and volume is high, the cost delta may outweigh the convenience of Fireworks' unified API.

Reddit community (r/startups 2025-03-07, r/OpenAI 2025-01-28)
Move High-Volume Workloads to On-Demand GPU Tiers medium success

Fireworks AI's Serverless tier charges per token, which can be costly at scale. For predictable, sustained inference workloads, On-Demand dedicated GPU instances (A100, H100/H200, or B200) may offer lower effective per-token costs. Contact Fireworks AI sales with your monthly token estimates to get a GPU-hour comparison.

Current tier data
Negotiate Enterprise Tier for Volume Commitments medium success

Fireworks AI's Enterprise tier is custom-quoted. Teams with large, predictable monthly token volumes should negotiate annual volume commitments in exchange for rate discounts and dedicated SLAs. Engage Fireworks sales with 3–6 months of usage data to support the negotiation.

Current tier data
Select the Lowest-Cost GPU Tier That Meets Latency Requirements medium success

Fireworks AI offers three On-Demand GPU grades: A100, H100/H200, and B200. A100 instances are typically lowest cost. Unless your workload requires H100/H200 or B200 throughput, default to A100 to minimize GPU-hour spend and negotiate upgrades only when latency SLAs demand it.

Current tier data

Full negotiation guide →

Fireworks AI Pricing FAQ

01 How much does Fireworks AI cost?

Fireworks AI serverless pricing starts at $0.10 per million tokens for small models (<4B parameters) and goes up to $0.90/M for models over 16B. On-demand GPU deployments range from $2.90/hr (A100) to $9.00/hr (B200). New accounts get $1 in free credits.

02 Does Fireworks AI have a free tier?

Fireworks AI offers $1 in free credits for new accounts. After that, pricing is pay-as-you-go with no minimum commitment. Batch inference and cached input tokens each offer 50% discounts, reducing ongoing costs.

03 How does Fireworks AI fine-tuning work?

Fireworks AI supports fine-tuning with SFT and DPO methods. Pricing ranges from $0.50/M training tokens for models under 16B to $10–20/M tokens for models over 300B. Fine-tuned models can be deployed on Serverless or dedicated infrastructure.

04 Fireworks AI vs Together AI: which should I choose?

Both offer serverless inference starting at $0.10/M tokens. Fireworks AI provides $1 free credits upfront and offers A100 On-Demand at $2.90/hr, while Together AI's comparable H100 dedicated is $3.99/hr. Fireworks AI is generally slightly cheaper for dedicated GPU hosting and offers batch discounts of 50%.

05 What is Fireworks AI On-Demand pricing?

Fireworks AI On-Demand GPU deployments are priced at $2.90/hr for A100 80GB, $6.00/hr for H100/H200, and $9.00/hr for B200. These are dedicated single-tenant deployments ideal for hosting custom fine-tuned models or maintaining consistent inference capacity.

06 Is Fireworks AI cheaper than going directly to model providers like DeepSeek?

Not always. Community comparisons from early 2025 noted DeepSeek R1 costing $8/1M output tokens on Fireworks Serverless versus $2.19/1M output tokens directly from DeepSeek. However, Fireworks pricing evolves frequently — the Artificial Analysis April 2026 benchmark shows the provider median at $1.68/1M output tokens across 16 tracked models. For high-volume single-model workloads, always compare current rates against direct provider APIs before committing.

07 What GPU options are available on Fireworks AI's On-Demand tier?

Fireworks AI offers three On-Demand GPU tiers: A100, H100/H200, and B200. All are custom-priced based on your requirements. The Enterprise tier adds dedicated infrastructure, SLA guarantees, and additional support. Contact Fireworks AI sales for specific GPU-hour pricing at your expected usage level.

08 Can I fine-tune models on Fireworks AI?

Fine-tuning has limitations. Community users have noted that MoE (Mixture of Experts) models over 176B parameters cannot be fine-tuned on the Serverless tier. Teams requiring fine-tuning of large MoE models need to use On-Demand or Enterprise tiers, which carry custom pricing.

09 What is the median cost per million tokens on Fireworks AI?

According to Artificial Analysis data from April 2026, Fireworks AI's median blended rate across 16 tracked models is $0.84 per 1M tokens, with a median input rate of $0.53/1M and median output rate of $1.68/1M. Individual model prices range from $0.20/1M blended (Qwen3-8B) to $2.15/1M blended (GLM-5-1).

Is this pricing incorrect? — we'll verify and update it.