AI Pricing Guide: Subscriptions, API Costs & Free Options (2025–2026)
Last updated: February 17, 2026
All prices in USD unless noted. Sources: official provider websites (OpenAI, Anthropic, Google, xAI, DeepSeek, Alibaba, Groq, etc.), OpenRouter, and reliable industry reports. Prices fluctuate — always verify on official sites before committing.
Note on data jurisdiction: Chinese-hosted models (DeepSeek, Qwen, Kimi, SiliconFlow, etc.) are extremely cheap but store data under PRC jurisdiction. Use only for non-sensitive work.
Index
1. Free Chat Interfaces
Free tiers provide entry-level access — perfect for casual users, testing, or getting started. Usage limits apply.
| Service | Latest Models Available | Daily / Session Limits | Link | Hosting |
|---|
| Google Gemini | Gemini 3 Pro / Flash | Truly unlimited basic; 5 prompts/day with 2.5 Pro (then switches to Flash), 100 images/day, 5 Deep Research reports/month | gemini.google.com | US/EU |
| Microsoft Copilot | GPT-5 / Claude 4.6 | Unlimited basic | copilot.microsoft.com | US/EU |
| ChatGPT | GPT-5o / GPT-4o mini | Usage-capped (resets every few hours); GPT-5 limited to ~10 messages/5 hours on free | chatgpt.com | US/EU |
| Claude.ai | Claude Sonnet 4.6 | Message limits resetting ~every 5 hours | claude.ai | US/EU |
| Grok | Grok 4.1 | Free for all X users; ~10 prompts/2 hrs | grok.x.ai | US |
| HuggingChat | Llama 4, Qwen3-Coder, Mistral, GLM-5 | Truly unlimited | huggingface.co/chat | US/EU |
| Perplexity | Standard AI model | Unlimited Quick searches; 5 Pro searches/day; 3 file uploads/day | perplexity.ai | US/EU |
| DeepSeek Chat | DeepSeek V4 / R1 | Very high (fair-use throttling only) | chat.deepseek.com | 🇨🇳 Chinese |
| Kimi | Moonshot K2.5 | High free tier | kimi.moonshot.cn | 🇨🇳 Chinese |
2. Paid Chat Subscriptions (Non-API)
Sorted by price, low to high. For same-price plans, most recent data listed first.
Individual Plans
| Provider | Plan | Cost (USD/mo) | Models & Key Limits | Notes |
|---|
| OpenAI | Go (India-only) | ~$4.55 (₹399) | GPT-5 access, expanded messaging vs. free | Regional pricing |
| Poe | Pro | $4.99 | Access to 20+ models (GPT, Claude, Llama, etc.) | Best for model experimentation |
| xAI | X Premium (Grok) | 8(7/mo annual) | Higher Grok limits vs. free; blue checkmark | Web US pricing |
| Google | Gemini AI Pro (Advanced) | 19.99(often9.99 promo) | Gemini 3 Pro, 2M context; 100 prompts/day, 1,000 images/day, 3 videos/day, 20 Deep Research/day; includes 2TB Google One | Also called "AI Plus" in some regions |
| Anthropic | Claude Pro | **20∗∗(17/mo annual) | All models incl. Claude Opus 4.6 + Claude Code; at least 5× more usage than free; priority access. Weekly rate limits (since Aug 28, 2025) | Best reasoning & agents |
| OpenAI | ChatGPT Plus | $20 | GPT-5 + o-series reasoning; up to 160 GPT-5 messages/3 hrs; up to 5× more messages than free; priority access | General + creative |
| Perplexity | Pro | **20∗∗(200/year) | 300+ Pro searches/day; unlimited uploads; image generation; access to GPT-4.1, Claude 4.0 Sonnet; $5/mo API credit | Research with sources |
| xAI | SuperGrok | 30(300/year) | Standalone advanced features; 128K context | Confirmed in media, not on official docs |
| xAI | X Premium+ | 40(229/year) | Ad-free X; highest Grok limits; Radar Search, Articles | Price increased from $22 in prior data |
| Anthropic | Max 5× | $100 | 5× usage vs. Pro; enhanced performance & priority | Heavy individual use |
| Anthropic | Max 20× | $200 | 20× usage vs. Pro; maximum priority | Power users requiring extensive AI |
| OpenAI | ChatGPT Pro | $200 | Unlimited GPT-5 & Thinking mode; Sora video tools; highest usage limits | Unlimited frontier |
| Google | Gemini Ultra | $249.99 | 500 prompts/day, 1,000 images/day, 5 videos/day, 200 Deep Research/month | Heavy creators |
| Perplexity | Max | 200(2,000/year) | All Pro features; unlimited Labs; early access to new features (Comet AI browser); priority frontier models | Everything unlocked |
| xAI | SuperGrok Heavy | 300(3,000/year) | Multi-agent "Heavy" plan; highest rate limits | Confirmed in X posts |
Team & Enterprise Plans
| Provider | Plan | Cost (USD/user/mo) | Key Features | Notes |
|---|
| Google | Workspace Business | 20(24 monthly billing) | Gemini in Gmail, Docs, Slides, Meet; enterprise security | Annual commitment |
| OpenAI | ChatGPT Business (Team) | 30(25/mo annual) | Min 2 users; collaborative workspace; admin controls; higher limits than Plus. June 2025 update: connectors to internal tools, security controls, record mode | Flexible pricing |
| Anthropic | Claude Team (Standard) | 30(25/mo annual) | Min 5 users; higher limits than Pro; team collaboration; centralized billing | Standard seats |
| Google | Workspace Enterprise | 30(36 monthly billing) | AI note-taking in Meet, translated captions, advanced features | Annual commitment |
| Perplexity | Enterprise Pro | 40(400/year) | All Pro features; Team Spaces; centralized admin; SOC 2 Type II; $5/mo API credit per seat | Compliance-ready |
| Anthropic | Team (Premium Seat) | $150 | Includes Claude Code; increased usage limits; min 5 users | Heavy team usage |
| Anthropic | Enterprise | Custom | All Team features + SSO, SCIM, advanced security, audit logs, dedicated support | Contact sales |
| OpenAI | Enterprise | Custom | ≥150 users; enterprise security; unlimited high-speed access; expanded context; dedicated support | Contact sales |
3. Free & Trial API Tiers
Generous Free APIs for Building & Testing
| Provider | Models | Free Limits | Context | Link | Hosting |
|---|
| Google AI Studio | Gemini 3 Pro / Flash | 15 RPM, 1M TPM, ~1,000 requests/day | 1M+ | aistudio.google.com | US/EU |
| Groq | Llama 4 Scout, Qwen3, Mixtral | High free tier (rate-limited) | 128K–1M | console.groq.com | US |
| DeepSeek | V4 / R1 / Coder | 5M tokens free credit for new users (~$8 value) | 128K+ | platform.deepseek.com | 🇨🇳 Chinese |
| GitHub Models | GPT-5o, Llama 4, Claude | Free playground for GitHub users | 128K+ | github.com/models | US/EU |
| xAI | Grok models | 25free+150/mo with data sharing | 131K+ | console.grok.com | US |
Free Stealth / Cloaked Models (OpenRouter)
These are pre-release models from undisclosed providers, available free during testing. Prompts & completions are logged for feedback.
| Model | Context | Notes |
|---|
| Sonoma Sky Alpha | 2,000,000 | Vision + parallel tool calling; frontier-class |
| Sonoma Dusk Alpha | 2,000,000 | Speed-optimized variant; vision + tools |
| Horizon Beta | 256,000 | Improved successor to Horizon Alpha |
| Cypher Alpha | 1,000,000 | All-purpose cloaked model |
| xAI grok code 1 | — | Was free at launch (Sep 10, 2025); then 0.20/1.50 |
4. Paid API Pricing (Per 1M Tokens)
Sorted by input cost, low to high. Includes caching/batch discounts where available.
Ultra-Budget Tier (< $0.15 input)
| Provider | Model | Input $/M | Output $/M | Context | Notes |
|---|
| DeepSeek | deepseek-chat (cache hit) | $0.028 | $0.42 | 128K+ | Cache miss: $0.28/M in. V4 pricing similar. |
| DeepSeek | deepseek-reasoner (cache hit) | $0.07 | $1.68 | 128K+ | Cache miss: $0.56/M in |
| Zhipu AI | GLM-4.5-airx | $0.02 | $0.06 | — | Ultra-fast, lowest cost model |
| OpenAI | GPT-5 Nano | $0.05 | $0.40 | 400K in / 128K out | Ultra-cheap routing & classification |
| Meta | Llama 3.2 11b Vision (via Deepinfra) | $0.055 | $0.055 | 128K | Vision capabilities |
| Alibaba | Qwen-Turbo | $0.0525 | $0.21 | 1M | Extremely low-cost |
| SiliconFlow | Qwen3-Coder-Next 7B/32B | $0.07 | $0.07 | 128K | 🇨🇳 Chinese hosting |
| OpenAI | GPT-4.1 nano | $0.10 | $0.40 | 1M in / 32K out | Lightweight variant |
| Google | Gemini 2.5 Flash-Lite / 3 Flash-Lite | 0.10−0.30 | $0.40 | 1M+ | Caching: 0.025–0.125/M in |
| Google | Gemini 2.0 Flash | 0.10−0.70 | $0.40 | 1M+ | Live API: 0.35–2.10 in / 1.50–8.50 out. Batch: 0.05/0.20 |
| Groq | Llama 4 Scout | $0.11 | $0.34 | 1M | Open-source; self-host viable; US hosting |
Budget Tier (0.15−0.99 input)
| Provider | Model | Input $/M | Output $/M | Context | Notes |
|---|
| OpenAI | GPT-4o Mini | $0.15 | $0.60 | 128K | High-volume tasks |
| Zhipu AI | GLM-4.5-air | $0.16 | $1.07 | 131K | Cost-effective lightweight |
| xAI | grok-3-mini | 0.20−0.30 | $0.50 | 131K | 8 rps. Cached: 0.075/Min.LiveSearch:25/1K sources |
| Meta | Llama 3.3 70b (via Deepinfra) | $0.23 | $0.40 | 128K | Open-source |
| OpenAI | GPT-5 mini | $0.25 | $2.00 | 400K in / 128K out | Mid-tier GPT-5 |
| Anthropic | Claude 3 Haiku | $0.25 | $1.25 | 200K | Batch: 0.125/0.625. Caching: write 0.30/read0.03 |
| DeepSeek | DeepSeek-V3 (via Deepinfra) | $0.27 | $1.10 | 64K | 8K max output |
| Alibaba | QVQ-72B-Preview | $0.28 | $0.55 | — | Up to 97% price cuts reported |
| Google | Gemini 2.5 Flash / 3 Flash | 0.30−1.00 | $2.50 | 1M+ | Live API: 0.50–3.00 in / 2.00–12.00 out |
| Alibaba | Qwen2.5 72B/7B | 0.30−0.35 | $0.40 | 128K | Large & small options |
| Zhipu AI | GLM-4.5 | $0.33 | $1.32 | 131K | 20M free tokens promo for new users |
| Meta | Llama 3.2 90b Vision (via Deepinfra) | $0.35 | $0.40 | 128K | Vision capabilities |
| Alibaba | Qwen3.5 Plus | $0.40 | $2.40 | 1M | Price/performance king. Multilingual coding. |
| OpenAI | GPT-4.1 mini | $0.40 | $1.60 | 1M in / 32K out | Smaller GPT-4.1 |
| Alibaba | Qwen-VL-Max | $0.41 | ~$0.41 | — | Vision model |
| Alibaba | Qwen-Plus | $0.42 | $1.26 | 131K | Tiered >128K; non-thinking ≤128K: 0.115/0.287 |
| OpenAI | GPT-3.5 Turbo | $0.50 | $1.50 | 16K | Legacy, still available |
| Zhipu AI | GLM-4.5v | $0.50 | $1.80 | 65.5K | Vision model |
| DeepSeek | DeepSeek-R1 (via Deepinfra) | $0.55 | $2.19 | 64K | 8K max output |
| Meta | Llama 3 70b (via Deepinfra/Groq) | $0.59 | $0.79 | 8K | Legacy |
| xAI | grok-3-mini-fast | $0.60 | $4.00 | 131K | 3 rps. Cached: $0.15/M in |
| Alibaba | Qwen3 235B Thinking | $0.74 | $8.82 | — | Flagship 235B; non-thinking: 0.70/2.80 |
| Anthropic | Claude 3.5 Haiku / 4.6 Haiku | 0.80−1.00 | 4.00−5.00 | 200K | Near-frontier speed. Batch: 0.40/2.00 |
| Alibaba | Qwen3-Max-Preview | $0.86+ | $3.44+ | 262K | 1T+ param; tiered by context window |
Mid-Range Tier (1.00−4.99 input)
| Provider | Model | Input $/M | Output $/M | Context | Notes |
|---|
| Perplexity | Sonar | $1.00 | $1.00 | — | Quick facts, news; lightweight |
| Perplexity | Sonar Reasoning | $1.00 | $5.00 | — | Step-by-step logic |
| OpenAI | o1-mini / o3-mini / o4-mini | $1.10 | $4.40 | 200K | Reasoning-optimized |
| OpenAI | GPT-5 | $1.25 | $10.00 | 400K in / 128K out | Latest flagship GPT-5 |
| Google | Gemini 2.5/3 Pro (≤200K) | $1.25 | $10.00 | 1M+ | Caching: 0.31/M.Batch:0.625/$5.00 |
| Google | Gemini 1.5 Pro | 1.25−2.50 | 5.00−10.00 | 1M | Previous gen; price tiered by context |
| Alibaba | Qwen2.5-Max | $1.60 | $6.40 | 32K | 8K max output |
| Alibaba | Qwen-Max | $1.68 | $6.72 | 32K | Free 1M tokens for new users (180 days) |
| OpenAI | GPT-5.3-Codex | $1.75 | $14.00 | 400K | Structured outputs; recursive self-debug |
| Meta | Llama 3.1 405b (via Deepinfra) | $1.79 | $1.79 | 128K | Largest open model |
| OpenAI | GPT-4.1 | $2.00 | $8.00 | 1M in / 32K out | Strong general-purpose |
| OpenAI | o3 | $2.00 | $8.00 | 200K | 100K max output |
| xAI | grok-2 / grok-2-vision | $2.00 | $10.00 | 131K | 10–15 rps; vision variant available |
| Google | Gemini 2.5/3 Pro (>200K) | 2.00−4.00 | 12.00−18.00 | 1M+ | Caching: 0.625/M.Batch:1.25/$7.50. Multimodal native. |
| Perplexity | Sonar Reasoning Pro / Deep Research | $2.00 | $8.00 | — | Multi-step reasoning + web search; citations |
| OpenAI | GPT-4o | 2.50−5.00 | 10.00−15.00 | 128K | Supports Realtime API (speech-to-speech, Aug 2025) |
| Anthropic | Claude 3.5/3.7/4/4.6 Sonnet | $3.00 | $15.00 | 200K–1M | Caching: write 3.75/read0.30. Batch: 1.50/7.50. Daily driver. |
| xAI | grok-3 / Grok 4 / Grok 4.1 | $3.00 | $15.00 | 131K–2M | Cached: 0.75/M.LiveSearch:25/1K sources. Real-time X data. |
| Cohere | Command-R+ | $3.00 | $15.00 | — | Enterprise-grade; RAG-focused |
| Perplexity | Sonar Pro | $3.00 | $15.00 | — | Advanced search; 2× citations vs. base |
| Zhipu AI | GLM-4.5-flash | $3.20 | $12.80 | — | Strong reasoning |
Premium Tier ($5.00+ input)
| Provider | Model | Input $/M | Output $/M | Context | Notes |
|---|
| Anthropic | Claude Opus 4.6 | $5.00 | $25.00 | 1M (beta) | 81.4% SWE-Bench. Caching: write 6.25/read0.50. Best coding model. |
| xAI | grok-3-fast | $5.00 | $25.00 | 131K | 10 rps. Cached: $1.25/M in |
| OpenAI | GPT-4 Turbo | $10.00 | $30.00 | 128K | Previous generation |
| OpenAI | o3 (higher tier) | $10.00 | $40.00 | 200K | Professional reasoning |
| Anthropic | Claude 4/4.1 Opus | $15.00 | $75.00 | 200K+ | Caching: write 18.75/read1.50. Batch: 7.50/37.50 |
| OpenAI | o1 / o1-preview | $15.00 | $60.00 | 200K | Deep reasoning |
| OpenAI | o3‑pro | $20.00 | $80.00 | 200K | Professional tier reasoning |
| OpenAI | GPT-4 | $30.00 | $60.00 | 8K | Legacy |
| OpenAI | GPT-4.5 | $75.00 | $150.00 | 128K | Preview model |
5. Cheapest API Combos (Realistic Blended Cost)
Blended cost formula: Input + (Output × 1.3) per 1M tokens — reflects typical balanced usage.
| Rank | Provider | Model | Blended Total | Context | Hosting | Best For |
|---|
| 1 | SiliconFlow | Qwen3-Coder-Next 7B/32B | $0.16 | 128K | 🇨🇳 Chinese | Experiments only |
| 2 | Groq | Llama 4 Scout | $0.55 | 1M | US | Open-source, safe hosting |
| 3 | DeepSeek | V4 (cache hit) | $0.57 | 128K | 🇨🇳 Chinese | Budget frontier coding |
| 4 | OpenAI | GPT-5 Nano | $0.57 | 400K | US/EU | Routing & classification |
| 5 | Google | Gemini 3 Flash-Lite | $0.62 | 1M+ | US/EU | Multimodal on budget |
| 6 | xAI | Grok 4.1 Fast | $0.85 | 2M | US | Real-time + speed |
| 7 | Alibaba | Qwen3.5 Plus | $3.52 | 1M | 🇨🇳 Chinese | Best price/performance frontier |
Tip for Indian developers: Use Groq + Google Gemini 3 Flash for safe, fast, cheap US/EU hosting. For cheapest experiments → DeepSeek / Qwen3.5 (but never send client code to China-hosted APIs).
6. API Cost Optimisation Strategies
| Strategy | Savings | How It Works |
|---|
| Prompt Caching | Up to 90% | Reuse system prompts/context. Supported by OpenAI, Anthropic, Google, DeepSeek, xAI. Cache-hit input rates often 75–95% cheaper. |
| Batch Processing | 50% | Async high-volume jobs with delayed responses. OpenAI & Google offer explicit batch pricing. |
| Smart Routing | 70–80% | Route hard tasks to flagships, simple tasks to budget models. Use OpenRouter.ai or LiteLLM for single-key multi-provider access. |
| Free Credits | 8−150 | New user promos: DeepSeek (~8),xAI(25 + $150/mo data share), Alibaba (1M tokens/180 days), Zhipu (20M tokens). Google AI Studio free tier for prototyping. |
| Context Window Management | 20–40% | Many providers charge more for >128K or >200K context. Keep prompts under thresholds when possible. |
Multi-Model Router Strategy
- Tools: OpenRouter.ai or LiteLLM for single-key access to all providers.
- Savings: ~70% by routing hard tasks to flagships, simple tasks to budget models.
- Stack Example: Primary: Claude Sonnet 4.6 → Fallback: DeepSeek V4 → Rerank: Cohere v3.5.
- Monitoring: Track spend via pricepertoken.com or provider dashboards.
7. Platform Fees — OpenRouter
OpenRouter aggregates models from multiple providers with these fees:
| Service / Feature | Cost / Details | Notes |
|---|
| Platform Fee on Inference | None (pass-through pricing) | No markup on inference; prices match providers |
| Zero-Completion Insurance (ZCI) | $0 (automatic) | Waives charges for zero-token or errored responses |
| BYOK Fee | 5% of provider cost | Charged against credits when routing with your own keys |
| Credit Purchase Fee | 5.5% (min $0.80) | Via Stripe; crypto: 5% |
| Stealth / Cloaked Models | Free during testing | Logging enabled; prompts/completions stored for feedback |
8. Quick Recommendation Summary
Best Free Options
| Use Case | Best Choice | Why |
|---|
| Daily free chat | Google Gemini 3 | Truly unlimited basic; best free model quality |
| Free coding API | Google AI Studio (Gemini 3 Flash) | 1,000 requests/day free; 1M context |
| Free open-source models | HuggingChat or Groq free tier | Llama 4, Qwen3, Mixtral at no cost |
| Free experimentation | DeepSeek (5M free tokens) | Frontier-level for $0 (China-hosted) |
Best Paid Options
| Use Case | Best Choice | Monthly Cost | Why |
|---|
| Best paid chat (serious work) | Claude Pro | $20 | Opus 4.6 + Claude Code access |
| Best general chat | ChatGPT Plus | $20 | GPT-5 + o-series reasoning |
| Best research chat | Perplexity Pro | $20 | Multi-model + citations + search |
| Cheapest high-quality API | Groq (Llama 4 Scout) | Pay-per-use (~$0.55/M) | US-hosted, fast, open-source |
| Best frontier API value | Qwen3.5 Plus or DeepSeek V4 | Pay-per-use | 76–77% SWE-Bench at 95% lower cost |
| Best premium API | Claude Opus 4.6 | Pay-per-use (5/25 per M) | 81.4% SWE-Bench; 1M context |
| Unlimited heavy usage | ChatGPT Pro or Claude Max 20× | $200 | No limits on frontier models |
| Enterprise compliance | OpenAI Enterprise or Anthropic Enterprise | Custom | SSO, SCIM, audit logs, SLA |
9. Glossary of Acronyms & Terms
| Term | Full Name | Context |
|---|
| API | Application Programming Interface | Programmatic access to models |
| ARC-AGI | Abstraction & Reasoning Corpus for AGI | Benchmark measuring general reasoning |
| BYOK | Bring Your Own Key | Use personal API keys on aggregator platforms |
| Cloaked Model | Pre-release model from undisclosed creator | OpenRouter stealth models for feedback testing |
| CoT | Chain of Thought | Step-by-step reasoning (e.g., GPT-5 Thinking) |
| Ctx | Context | Token window size for inputs |
| GDPR | General Data Protection Regulation | EU privacy law compliance |
| HIPAA | Health Insurance Portability and Accountability Act | US healthcare privacy compliance |
| I/O | Input/Output | Token pricing separation |
| IDE | Integrated Development Environment | Code development tools |
| INR | Indian Rupee | Regional pricing currency |
| LLM | Large Language Model | Core AI models trained on text data |
| M Tokens | Million Tokens | Pricing unit (~750 words ≈ 1,000 tokens) |
| MoE | Mixture of Experts | Architecture using specialised sub-networks |
| ROI | Return on Investment | Business value metric |
| RPM / RPS | Requests Per Minute / Second | API rate limits |
| SAML | Security Assertion Markup Language | Enterprise authentication standard |
| SCIM | System for Cross-domain Identity Management | Automates user provisioning |
| SLA | Service Level Agreement | Uptime/performance guarantees |
| SOC 2 | Service Organization Control 2 | Security compliance framework |
| SSO | Single Sign-On | Enterprise unified login |
| SWE-Bench | Software Engineering Benchmark | Real-world coding evaluation |
| TTL | Time To Live | Cache duration |
| VLM | Vision Language Model | Models handling text + images |
| ZCI | Zero Completion Insurance | OpenRouter: no charge for failed/empty responses |
10. Important Notes
- Prices change frequently — always verify on official provider sites before committing budget.
- Overages are real — Credit-based plans (Claude Max, ChatGPT Pro, Perplexity Max) exist because heavy users often exceed base tier limits.
- Caching is your best friend — Prompt caching alone can cut API costs 75–90% for repetitive workloads.
- Data jurisdiction matters — Chinese-hosted models (DeepSeek, Qwen, SiliconFlow, Kimi) are extremely cheap but store data under PRC jurisdiction. Use US/EU-hosted alternatives (Groq, Google, OpenAI, Anthropic) for sensitive or client work.
- OpenAI Realtime API (Aug 2025) — Adds speech-to-speech, MCP server support, image input, and SIP phone calling; billed at model token rates. Relevant for voice agent builders.
- Free tiers for prototyping — Google AI Studio, Groq, HuggingChat, and GitHub Models offer generous free access to start building immediately.
Bottom line: Everything is cheaper and smarter than even three months ago. Start with free tiers to prototype, scale to budget APIs for production, and reserve premium models for hard problems. 🚀