Together AI Pricing 2026
Complete pricing guide with plans, and cost analysis
Together AI pricing ranges from $0.03 to $9.95/per million tokens / hour.
Together AI costs $0.03 to $9.95 per per million tokens / hour as of May 2026, with 5 plans available. Pricing depends on your chosen tier, contract length, and negotiated discounts.
Use the interactive pricing calculator to estimate your exact cost based on team size and requirements.
- Free tier: No free tier available
Together AI offers 5 pricing tiers: Serverless, Dedicated (1x H100), Dedicated (1x H200), Dedicated (1x B200), Enterprise. The Dedicated (1x H100) plan is consistent high-volume inference.
Compared to other llm api providers software, Together AI is positioned at the budget-friendly price point.
- 0
How much does Together AI cost?
Together AI Pricing Overview
Together AI has 5 pricing plans ranging from $0.03 to $9.95/per million tokens / hour. The Serverless plan requires contacting sales for a custom quote and is designed for variable-volume api usage. The Dedicated (1x H100) plan requires contacting sales for a custom quote and is designed for consistent high-volume inference. The Dedicated (1x H200) plan requires contacting sales for a custom quote and is designed for high-throughput dedicated inference. The Dedicated (1x B200) plan requires contacting sales for a custom quote and is designed for high-performance dedicated inference. The Enterprise plan requires contacting sales for a custom quote and is designed for large-scale enterprise deployments.
This pricing was last verified in May 6, 2026 from 3 independent sources.
Together AI offers usage-based inference pricing through its Serverless tier, with dedicated GPU options (1x H100, 1x H200, 1x B200) and an Enterprise plan available at custom-quoted rates. Serverless pricing varies by model — the provider median is $0.50/1M input tokens and $1.20/1M output tokens across 23 tracked models, with individual models ranging from $0.02/1M input tokens for lightweight options up to $2.00/1M for large-scale coding models. Dedicated instances and Enterprise pricing require direct contact with Together AI's sales team.
How Together AI Pricing Compares
Compare Together AI pricing against top alternatives in LLM API Providers.
All Together AI Plans & Pricing
| Plan | Monthly | Annual | Best For |
|---|---|---|---|
| Serverless | Contact Sales | Contact Sales | Variable-volume API usage |
| Dedicated (1x H100) | Contact Sales | Contact Sales | Consistent high-volume inference |
| Dedicated (1x H200) | Contact Sales | Contact Sales | High-throughput dedicated inference |
| Dedicated (1x B200) | Contact Sales | Contact Sales | High-performance dedicated inference |
| Enterprise | Contact Sales | Contact Sales | Large-scale enterprise deployments |
View all features by plan
Serverless
- Pay-as-you-go per-token pricing
- Budget models from $0.03/M tokens
- Mid-range models from $0.50/M tokens
- Large models from $1.00/M tokens
- Batch API with 50% discount for most models
- Cached input pricing for select models
- Vision, image, audio, video, and transcription models available
Dedicated (1x H100)
- Single-tenant GPU deployment
- 1x H100 80GB at $3.99/hr
- Custom model hosting
- Autoscaling and traffic spike handling
- Guaranteed performance
Dedicated (1x H200)
- Single-tenant GPU deployment
- 1x H200 141GB at $5.49/hr
- Custom model hosting
- Autoscaling and traffic spike handling
- Guaranteed performance
Dedicated (1x B200)
- Single-tenant GPU deployment
- 1x B200 180GB at $9.95/hr
- Latest generation hardware
- Autoscaling and traffic spike handling
- Guaranteed performance
Enterprise
- Volume discounts
- Dedicated support
- Custom SLAs
- Private deployments
Usage-Based Rates
Per-unit pricing for Together AI API usage.
Serverless
| Model | Input | Output | Cached | Per |
|---|---|---|---|---|
| glm-5-1 | $1.40 | $4.40 | — | 1M tokens |
| minimax-m2-7 | $0.300 | $1.20 | $0.060 | 1M tokens |
| kimi-k2-6 | $1.20 | $4.50 | $0.200 | 1M tokens |
| deepseek-v4-pro | $2.10 | $4.40 | $0.200 | 1M tokens |
| qwen3-6-plus | $0.500 | $3.00 | — | 1M tokens |
| gpt-oss-120b | $0.150 | $0.600 | — | 1M tokens |
| lfm2-24b-a2b | $0.030 | $0.120 | — | 1M tokens |
| qwen3-5-397b-a17b | $0.600 | $3.60 | — | 1M tokens |
| minimax-m2-5 | $0.300 | $1.20 | $0.060 | 1M tokens |
| glm-5 | $1.00 | $3.20 | — | 1M tokens |
| qwen3-coder-next | $0.500 | $1.20 | — | 1M tokens |
| kimi-k2-5 | $0.500 | $2.80 | — | 1M tokens |
| qwen3-5-9b | $0.100 | $0.150 | — | 1M tokens |
| gemma-4-31b | $0.200 | $0.500 | — | 1M tokens |
| deepseek-v3-1 | $0.600 | $1.70 | — | 1M tokens |
| cogito-v2-1-671b | $1.25 | $1.25 | — | 1M tokens |
| qwen3-coder-480b-a35b-instruct | $2.00 | $2.00 | — | 1M tokens |
| rnj-1-instruct | $0.150 | $0.150 | — | 1M tokens |
| deepseek-r1-0528 | $3.00 | $7.00 | — | 1M tokens |
| llama-3-3-70b | $0.880 | $0.880 | — | 1M tokens |
| gemma-3n-e4b-instruct | $0.060 | $0.120 | — | 1M tokens |
| gpt-oss-20b | $0.050 | $0.200 | — | 1M tokens |
| qwen3-235b-a22b-fp8-throughput | $0.200 | $0.600 | — | 1M tokens |
| qwen2-5-7b-instruct-turbo | $0.300 | $0.300 | — | 1M tokens |
| llama-3-8b-instruct-lite | $0.100 | $0.100 | — | 1M tokens |
- Top models listed; many more available on platform
- Cached input pricing available for select models
- Batch inference available at ~50% discount for most models
Dedicated (1x H100)
| Model | Unit | Rate |
|---|---|---|
| h100-80gb | hour | $3.99 |
- $3.99/hour per H100 GPU
Dedicated (1x H200)
| Model | Unit | Rate |
|---|---|---|
| h200-141gb | hour | $5.49 |
- $5.49/hour per H200 GPU
Dedicated (1x B200)
| Model | Unit | Rate |
|---|---|---|
| b200-180gb | hour | $9.95 |
- $9.95/hour per B200 GPU
Compare Together AI vs Alternatives
Before committing to Together AI, compare pricing with these 3 alternatives in the same category.
What Companies Actually Pay for Together AI
| Model | Input /1M | Output /1M | Blended /1M |
|---|---|---|---|
| togetherai_glm-5-1 | $1.40 | $4.40 | $2.15 |
| togetherai_qwen3-coder-480b-a35b-instruct_fp8 | $2.00 | $2.00 | $2.00 |
| togetherai_glm-5_fp4 | $1.00 | $3.20 | $1.55 |
| togetherai_deepseek-v3-0324 | $1.25 | $1.25 | $1.25 |
| togetherai_cogito-v2-1-reasoning | $1.25 | $1.25 | $1.25 |
How Together AI Pricing Compares
| Software | Starting Price | Top Price |
|---|---|---|
| Together AI | $0.03/per million tokens / hour | $9.95/per million tokens / hour |
| Amazon Bedrock | $0.07/per million tokens | $75/per million tokens |
| Anyscale | $0.15/per million tokens | $5/per million tokens |
| Baidu ERNIE API | $0.1/per million tokens | $10/per million tokens |
| Cerebras Inference API | $0.1/per million tokens | $6/per million tokens |
| Claude API | $0.03/per million tokens | $75/per million tokens |
Detailed pricing comparisons:
Together AI Pricing FAQ
01 How much does Together AI cost?
Together AI offers serverless inference starting at $0.10 per million tokens for small models. Mid-range models cost $0.50–1.00/M tokens, and large models like DeepSeek-R1 cost $3.00/M tokens. Dedicated GPU deployments start at $3.99/hr (1x H100) or $9.95/hr (1x B200). Batch processing saves 40–50%.
02 Does Together AI have a free tier?
Together AI does not advertise a permanent free tier or free credits on their pricing page. They offer pay-as-you-go Serverless pricing with no minimum commitment, so you only pay for what you use.
03 What models does Together AI support?
Together AI supports a wide range of open-source models including Llama, DeepSeek, Qwen, Mistral, and Kimi. They also offer image generation (FLUX, Stable Diffusion), video (Google Veo 2.0), audio transcription, text-to-speech, and embedding models.
04 Together AI vs Fireworks AI: which is cheaper?
Both offer similar serverless per-token pricing starting around $0.10/M tokens for small models. Fireworks AI gives new users $1 in free credits. For dedicated GPU hosting, Together AI's H100 is $3.99/hr versus Fireworks AI's A100 at $2.90/hr, making Fireworks slightly cheaper for dedicated compute at equivalent GPU tiers.
05 What is Together AI's Dedicated GPU pricing?
Together AI's Dedicated GPU hosting starts at $3.99/hr for a 1x H100 (single-tenant) and $9.95/hr for a 1x B200 (latest generation). Dedicated deployments are best for consistent high-volume inference where you need guaranteed resources and custom model hosting.
06 What are the cheapest models available on Together AI?
Based on Artificial Analysis data, the most affordable models on Together AI's Serverless tier start at $0.02/1M input tokens (Gemma 3n E4B) and $0.03/1M input tokens (LFM2 24B A2B). The provider median blended rate across all 23 tracked models is $0.875/1M tokens.
07 Does Together AI offer dedicated GPU instances?
Yes. Together AI offers dedicated GPU instances on three hardware tiers: 1x H100, 1x H200, and 1x B200. All dedicated instance pricing is custom-quoted. An Enterprise plan is also available for larger-scale deployments requiring custom SLAs or support.
Is this pricing incorrect? — we'll verify and update it.