Quick Answer
Last verified:
High confidence

Together AI costs $0.03 to $9.95 per per million tokens / hour as of May 2026, with 5 plans available. Pricing depends on your chosen tier, contract length, and negotiated discounts.

Use the interactive pricing calculator to estimate your exact cost based on team size and requirements.

  • Free tier: No free tier available

Together AI offers 5 pricing tiers: Serverless, Dedicated (1x H100), Dedicated (1x H200), Dedicated (1x B200), Enterprise. The Dedicated (1x H100) plan is consistent high-volume inference.

Compared to other llm api providers software, Together AI is positioned at the budget-friendly price point.

    0

How much does Together AI cost?

Together AI pricing starts at $0.03/per million tokens / hour across 5 plans, with enterprise pricing available on request. Plans include Serverless (custom pricing), Dedicated (1x H100) (custom pricing), Dedicated (1x H200) (custom pricing), Dedicated (1x B200) (custom pricing), Enterprise (custom pricing).

Together AI Pricing Overview

Together AI has 5 pricing plans ranging from $0.03 to $9.95/per million tokens / hour. The Serverless plan requires contacting sales for a custom quote and is designed for variable-volume api usage. The Dedicated (1x H100) plan requires contacting sales for a custom quote and is designed for consistent high-volume inference. The Dedicated (1x H200) plan requires contacting sales for a custom quote and is designed for high-throughput dedicated inference. The Dedicated (1x B200) plan requires contacting sales for a custom quote and is designed for high-performance dedicated inference. The Enterprise plan requires contacting sales for a custom quote and is designed for large-scale enterprise deployments.

This pricing was last verified in May 6, 2026 from 3 independent sources.

Together AI offers usage-based inference pricing through its Serverless tier, with dedicated GPU options (1x H100, 1x H200, 1x B200) and an Enterprise plan available at custom-quoted rates. Serverless pricing varies by model — the provider median is $0.50/1M input tokens and $1.20/1M output tokens across 23 tracked models, with individual models ranging from $0.02/1M input tokens for lightweight options up to $2.00/1M for large-scale coding models. Dedicated instances and Enterprise pricing require direct contact with Together AI's sales team.

How Together AI Pricing Compares

Compare Together AI pricing against top alternatives in LLM API Providers.

All Together AI Plans & Pricing

Plan Monthly Annual Best For
Serverless Contact Sales Contact Sales Variable-volume API usage
Dedicated (1x H100) Contact Sales Contact Sales Consistent high-volume inference
Dedicated (1x H200) Contact Sales Contact Sales High-throughput dedicated inference
Dedicated (1x B200) Contact Sales Contact Sales High-performance dedicated inference
Enterprise Contact Sales Contact Sales Large-scale enterprise deployments
View all features by plan

Serverless

  • Pay-as-you-go per-token pricing
  • Budget models from $0.03/M tokens
  • Mid-range models from $0.50/M tokens
  • Large models from $1.00/M tokens
  • Batch API with 50% discount for most models
  • Cached input pricing for select models
  • Vision, image, audio, video, and transcription models available

Dedicated (1x H100)

  • Single-tenant GPU deployment
  • 1x H100 80GB at $3.99/hr
  • Custom model hosting
  • Autoscaling and traffic spike handling
  • Guaranteed performance

Dedicated (1x H200)

  • Single-tenant GPU deployment
  • 1x H200 141GB at $5.49/hr
  • Custom model hosting
  • Autoscaling and traffic spike handling
  • Guaranteed performance

Dedicated (1x B200)

  • Single-tenant GPU deployment
  • 1x B200 180GB at $9.95/hr
  • Latest generation hardware
  • Autoscaling and traffic spike handling
  • Guaranteed performance

Enterprise

  • Volume discounts
  • Dedicated support
  • Custom SLAs
  • Private deployments

Usage-Based Rates

Per-unit pricing for Together AI API usage.

Serverless

Model Input Output Cached Per
glm-5-1 $1.40 $4.40 1M tokens
minimax-m2-7 $0.300 $1.20 $0.060 1M tokens
kimi-k2-6 $1.20 $4.50 $0.200 1M tokens
deepseek-v4-pro $2.10 $4.40 $0.200 1M tokens
qwen3-6-plus $0.500 $3.00 1M tokens
gpt-oss-120b $0.150 $0.600 1M tokens
lfm2-24b-a2b $0.030 $0.120 1M tokens
qwen3-5-397b-a17b $0.600 $3.60 1M tokens
minimax-m2-5 $0.300 $1.20 $0.060 1M tokens
glm-5 $1.00 $3.20 1M tokens
qwen3-coder-next $0.500 $1.20 1M tokens
kimi-k2-5 $0.500 $2.80 1M tokens
qwen3-5-9b $0.100 $0.150 1M tokens
gemma-4-31b $0.200 $0.500 1M tokens
deepseek-v3-1 $0.600 $1.70 1M tokens
cogito-v2-1-671b $1.25 $1.25 1M tokens
qwen3-coder-480b-a35b-instruct $2.00 $2.00 1M tokens
rnj-1-instruct $0.150 $0.150 1M tokens
deepseek-r1-0528 $3.00 $7.00 1M tokens
llama-3-3-70b $0.880 $0.880 1M tokens
gemma-3n-e4b-instruct $0.060 $0.120 1M tokens
gpt-oss-20b $0.050 $0.200 1M tokens
qwen3-235b-a22b-fp8-throughput $0.200 $0.600 1M tokens
qwen2-5-7b-instruct-turbo $0.300 $0.300 1M tokens
llama-3-8b-instruct-lite $0.100 $0.100 1M tokens
  • Top models listed; many more available on platform
  • Cached input pricing available for select models
  • Batch inference available at ~50% discount for most models

Dedicated (1x H100)

Model Unit Rate
h100-80gb hour $3.99
  • $3.99/hour per H100 GPU

Dedicated (1x H200)

Model Unit Rate
h200-141gb hour $5.49
  • $5.49/hour per H200 GPU

Dedicated (1x B200)

Model Unit Rate
b200-180gb hour $9.95
  • $9.95/hour per B200 GPU

Compare Together AI vs Alternatives

Before committing to Together AI, compare pricing with these 3 alternatives in the same category.

All Together AI alternatives & migration guides

What Companies Actually Pay for Together AI

Median per-1M-token pricing across 23 models
Input $0.500/1M
Output $1.20/1M
Flagship models in this provider's catalog
Model Input /1M Output /1M Blended /1M
togetherai_glm-5-1 $1.40 $4.40 $2.15
togetherai_qwen3-coder-480b-a35b-instruct_fp8 $2.00 $2.00 $2.00
togetherai_glm-5_fp4 $1.00 $3.20 $1.55
togetherai_deepseek-v3-0324 $1.25 $1.25 $1.25
togetherai_cogito-v2-1-reasoning $1.25 $1.25 $1.25
Review scores
Source: Artificial Analysis — medians aggregated from 23 models in this provider's catalog. Per-1M-token pricing reflects list rates.

How Together AI Pricing Compares

Software Starting Price Top Price
Together AI $0.03/per million tokens / hour $9.95/per million tokens / hour
Amazon Bedrock $0.07/per million tokens $75/per million tokens
Anyscale $0.15/per million tokens $5/per million tokens
Baidu ERNIE API $0.1/per million tokens $10/per million tokens
Cerebras Inference API $0.1/per million tokens $6/per million tokens
Claude API $0.03/per million tokens $75/per million tokens

Together AI Pricing FAQ

01 How much does Together AI cost?

Together AI offers serverless inference starting at $0.10 per million tokens for small models. Mid-range models cost $0.50–1.00/M tokens, and large models like DeepSeek-R1 cost $3.00/M tokens. Dedicated GPU deployments start at $3.99/hr (1x H100) or $9.95/hr (1x B200). Batch processing saves 40–50%.

02 Does Together AI have a free tier?

Together AI does not advertise a permanent free tier or free credits on their pricing page. They offer pay-as-you-go Serverless pricing with no minimum commitment, so you only pay for what you use.

03 What models does Together AI support?

Together AI supports a wide range of open-source models including Llama, DeepSeek, Qwen, Mistral, and Kimi. They also offer image generation (FLUX, Stable Diffusion), video (Google Veo 2.0), audio transcription, text-to-speech, and embedding models.

04 Together AI vs Fireworks AI: which is cheaper?

Both offer similar serverless per-token pricing starting around $0.10/M tokens for small models. Fireworks AI gives new users $1 in free credits. For dedicated GPU hosting, Together AI's H100 is $3.99/hr versus Fireworks AI's A100 at $2.90/hr, making Fireworks slightly cheaper for dedicated compute at equivalent GPU tiers.

05 What is Together AI's Dedicated GPU pricing?

Together AI's Dedicated GPU hosting starts at $3.99/hr for a 1x H100 (single-tenant) and $9.95/hr for a 1x B200 (latest generation). Dedicated deployments are best for consistent high-volume inference where you need guaranteed resources and custom model hosting.

06 What are the cheapest models available on Together AI?

Based on Artificial Analysis data, the most affordable models on Together AI's Serverless tier start at $0.02/1M input tokens (Gemma 3n E4B) and $0.03/1M input tokens (LFM2 24B A2B). The provider median blended rate across all 23 tracked models is $0.875/1M tokens.

07 Does Together AI offer dedicated GPU instances?

Yes. Together AI offers dedicated GPU instances on three hardware tiers: 1x H100, 1x H200, and 1x B200. All dedicated instance pricing is custom-quoted. An Enterprise plan is also available for larger-scale deployments requiring custom SLAs or support.

Is this pricing incorrect? — we'll verify and update it.