Together AI Pricing: $0.03-$9.95/per million tokens / hour

Price checkPer per million tokens

ServerlessCustom Dedicated (1x H100)Custom Dedicated (1x H200)Custom

Quick Answer

Last verified: May 6, 2026

High confidence

Together AI costs $0.03 to $9.95 per per million tokens / hour as of May 2026, with 5 plans available. Pricing depends on your chosen tier, contract length, and negotiated discounts.

Use the interactive pricing calculator to estimate your exact cost based on team size and requirements.

Free tier: No free tier available

Together AI offers 5 pricing tiers: Serverless, Dedicated (1x H100), Dedicated (1x H200), Dedicated (1x B200), Enterprise. The Dedicated (1x H100) plan is consistent high-volume inference.

Compared to other llm api providers software, Together AI is positioned at the budget-friendly price point.

How much does Together AI cost?

Together AI pricing starts at $0.03/per million tokens / hour across 5 plans, with enterprise pricing available on request. Plans include Serverless (custom pricing), Dedicated (1x H100) (custom pricing), Dedicated (1x H200) (custom pricing), Dedicated (1x B200) (custom pricing), Enterprise (custom pricing).

Together AI Pricing Overview

Together AI has 5 pricing plans ranging from $0.03 to $9.95/per million tokens / hour. The Serverless plan requires contacting sales for a custom quote and is designed for variable-volume api usage. The Dedicated (1x H100) plan requires contacting sales for a custom quote and is designed for consistent high-volume inference. The Dedicated (1x H200) plan requires contacting sales for a custom quote and is designed for high-throughput dedicated inference. The Dedicated (1x B200) plan requires contacting sales for a custom quote and is designed for high-performance dedicated inference. The Enterprise plan requires contacting sales for a custom quote and is designed for large-scale enterprise deployments.

This pricing was last verified in May 6, 2026 from 3 independent sources.

See Together AI Plans

Together AI offers usage-based inference pricing through its Serverless tier, with dedicated GPU options (1x H100, 1x H200, 1x B200) and an Enterprise plan available at custom-quoted rates. Serverless pricing varies by model — the provider median is $0.50/1M input tokens and $1.20/1M output tokens across 23 tracked models, with individual models ranging from $0.02/1M input tokens for lightweight options up to $2.00/1M for large-scale coding models. Dedicated instances and Enterprise pricing require direct contact with Together AI's sales team.

How Together AI Pricing Compares

Compare Together AI pricing against top alternatives in LLM API Providers.

Groq $0-$3.0/per million tokens Compare → Fireworks AI $0-$9/per million tokens / hour Compare → Google Gemini API $0-$18.0/per million tokens Compare →

All Together AI Plans & Pricing

Plan	Monthly	Annual	Best For
Serverless	Contact Sales	Contact Sales	Variable-volume API usage
Dedicated (1x H100)	Contact Sales	Contact Sales	Consistent high-volume inference
Dedicated (1x H200)	Contact Sales	Contact Sales	High-throughput dedicated inference
Dedicated (1x B200)	Contact Sales	Contact Sales	High-performance dedicated inference
Enterprise	Contact Sales	Contact Sales	Large-scale enterprise deployments

View all features by plan

Serverless

Pay-as-you-go per-token pricing
Budget models from $0.03/M tokens
Mid-range models from $0.50/M tokens
Large models from $1.00/M tokens
Batch API with 50% discount for most models
Cached input pricing for select models
Vision, image, audio, video, and transcription models available

Dedicated (1x H100)

Single-tenant GPU deployment
1x H100 80GB at $3.99/hr
Custom model hosting
Autoscaling and traffic spike handling
Guaranteed performance

Dedicated (1x H200)

Single-tenant GPU deployment
1x H200 141GB at $5.49/hr
Custom model hosting
Autoscaling and traffic spike handling
Guaranteed performance

Dedicated (1x B200)

Single-tenant GPU deployment
1x B200 180GB at $9.95/hr
Latest generation hardware
Autoscaling and traffic spike handling
Guaranteed performance

Enterprise

Volume discounts
Dedicated support
Custom SLAs
Private deployments

See Together AI Plans

Usage-Based Rates

Per-unit pricing for Together AI API usage.

Serverless

Model	Input	Output	Cached	Per
glm-5-1	$1.40	$4.40	—	1M tokens
minimax-m2-7	$0.300	$1.20	$0.060	1M tokens
kimi-k2-6	$1.20	$4.50	$0.200	1M tokens
deepseek-v4-pro	$2.10	$4.40	$0.200	1M tokens
qwen3-6-plus	$0.500	$3.00	—	1M tokens
gpt-oss-120b	$0.150	$0.600	—	1M tokens
lfm2-24b-a2b	$0.030	$0.120	—	1M tokens
qwen3-5-397b-a17b	$0.600	$3.60	—	1M tokens
minimax-m2-5	$0.300	$1.20	$0.060	1M tokens
glm-5	$1.00	$3.20	—	1M tokens
qwen3-coder-next	$0.500	$1.20	—	1M tokens
kimi-k2-5	$0.500	$2.80	—	1M tokens
qwen3-5-9b	$0.100	$0.150	—	1M tokens
gemma-4-31b	$0.200	$0.500	—	1M tokens
deepseek-v3-1	$0.600	$1.70	—	1M tokens
cogito-v2-1-671b	$1.25	$1.25	—	1M tokens
qwen3-coder-480b-a35b-instruct	$2.00	$2.00	—	1M tokens
rnj-1-instruct	$0.150	$0.150	—	1M tokens
deepseek-r1-0528	$3.00	$7.00	—	1M tokens
llama-3-3-70b	$0.880	$0.880	—	1M tokens
gemma-3n-e4b-instruct	$0.060	$0.120	—	1M tokens
gpt-oss-20b	$0.050	$0.200	—	1M tokens
qwen3-235b-a22b-fp8-throughput	$0.200	$0.600	—	1M tokens
qwen2-5-7b-instruct-turbo	$0.300	$0.300	—	1M tokens
llama-3-8b-instruct-lite	$0.100	$0.100	—	1M tokens

Top models listed; many more available on platform
Cached input pricing available for select models
Batch inference available at ~50% discount for most models

Dedicated (1x H100)

Model	Unit	Rate
h100-80gb	hour	$3.99

$3.99/hour per H100 GPU

Dedicated (1x H200)

Model	Unit	Rate
h200-141gb	hour	$5.49

$5.49/hour per H200 GPU

Dedicated (1x B200)

Model	Unit	Rate
b200-180gb	hour	$9.95

$9.95/hour per B200 GPU

Compare Together AI vs Alternatives

Before committing to Together AI, compare pricing with these 3 alternatives in the same category.

VSGroq

Free

Prototyping and evaluation

Full comparison

VSFireworks AI

Free

Variable-volume API usage

Full comparison

VSGoogle Gemini API

Free

Prototyping and evaluation

Full comparison

All Together AI alternatives & migration guides

What Companies Actually Pay for Together AI

Median per-1M-token pricing across 23 models

Input $0.500/1M

Output $1.20/1M

Flagship models in this provider's catalog

Model	Input /1M	Output /1M	Blended /1M
togetherai_glm-5-1	$1.40	$4.40	$2.15
togetherai_qwen3-coder-480b-a35b-instruct_fp8	$2.00	$2.00	$2.00
togetherai_glm-5_fp4	$1.00	$3.20	$1.55
togetherai_deepseek-v3-0324	$1.25	$1.25	$1.25
togetherai_cogito-v2-1-reasoning	$1.25	$1.25	$1.25

Review scores

Source: Artificial Analysis — medians aggregated from 23 models in this provider's catalog. Per-1M-token pricing reflects list rates.

How Together AI Pricing Compares

Software	Starting Price	Top Price
Together AI	$0.03/per million tokens / hour	$9.95/per million tokens / hour
Amazon Bedrock	$0.07/per million tokens	$75/per million tokens
Anyscale	$0.15/per million tokens	$5/per million tokens
Baidu ERNIE API	$0.1/per million tokens	$10/per million tokens
Cerebras Inference API	$0.1/per million tokens	$6/per million tokens
Claude API	$0.03/per million tokens	$75/per million tokens

Detailed pricing comparisons:

Browse all LLM API Providers pricing →

Together AI Pricing FAQ

01 How much does Together AI cost?

Together AI offers serverless inference starting at $0.10 per million tokens for small models. Mid-range models cost $0.50–1.00/M tokens, and large models like DeepSeek-R1 cost $3.00/M tokens. Dedicated GPU deployments start at $3.99/hr (1x H100) or $9.95/hr (1x B200). Batch processing saves 40–50%.

02 Does Together AI have a free tier?

Together AI does not advertise a permanent free tier or free credits on their pricing page. They offer pay-as-you-go Serverless pricing with no minimum commitment, so you only pay for what you use.

03 What models does Together AI support?

Together AI supports a wide range of open-source models including Llama, DeepSeek, Qwen, Mistral, and Kimi. They also offer image generation (FLUX, Stable Diffusion), video (Google Veo 2.0), audio transcription, text-to-speech, and embedding models.

04 Together AI vs Fireworks AI: which is cheaper?

Both offer similar serverless per-token pricing starting around $0.10/M tokens for small models. Fireworks AI gives new users $1 in free credits. For dedicated GPU hosting, Together AI's H100 is $3.99/hr versus Fireworks AI's A100 at $2.90/hr, making Fireworks slightly cheaper for dedicated compute at equivalent GPU tiers.

05 What is Together AI's Dedicated GPU pricing?

Together AI's Dedicated GPU hosting starts at $3.99/hr for a 1x H100 (single-tenant) and $9.95/hr for a 1x B200 (latest generation). Dedicated deployments are best for consistent high-volume inference where you need guaranteed resources and custom model hosting.

06 What are the cheapest models available on Together AI?

Based on Artificial Analysis data, the most affordable models on Together AI's Serverless tier start at $0.02/1M input tokens (Gemma 3n E4B) and $0.03/1M input tokens (LFM2 24B A2B). The provider median blended rate across all 23 tracked models is $0.875/1M tokens.

07 Does Together AI offer dedicated GPU instances?

Yes. Together AI offers dedicated GPU instances on three hardware tiers: 1x H100, 1x H200, and 1x B200. All dedicated instance pricing is custom-quoted. An Enterprise plan is also available for larger-scale deployments requiring custom SLAs or support.

Is this pricing incorrect? — we'll verify and update it.