The AI Developer's Handbook

0:00

The AI Developer's Handbook 2025

Last updated: 2025

OpenAI APIs, Context Mastery, Tokens & Tooling for Software Engineers

📖 Table of Contents

The AI Developer's Handbook 2025
- 📖 Table of Contents
Part 1 — Getting Started & API Landscape
Part 2 — AI Context: MCP, RAG, Tools & Architecture
Part 3 — Mastering Tokens & Context for Programming

Part 1 — Getting Started & API Landscape

All factual • SWE-I friendly • sorted from least 🤖 to most 🤖🤖🤖 work

1.1 Docs Topics (Least AI → Most AI)

⭐ Rank	📂 Topic Group	🎯 SWE-Simple Meaning	💫 Key Sub-Topics	🤖 AI-Level
1	Getting Started	First API call, set key, pick model, check cost	Overview, Quickstart, Libraries, Pricing, Models	⭐
2	Core Features	Ask model to 📝 write, 🖼️ draw, 🗣️ talk, or return JSON	Text gen, Images & vision, Audio, Structured output	⭐⭐
3	Run & Scale	Make it real-time, stream tokens, store chat, handle webhooks	Streaming, Conv. state, Webhooks, Background mode	⭐⭐
4	Prompting & Reasoning	Craft better instructions / schemas	Prompting guides, Reasoning tips	⭐⭐⭐
5	Tools & Connectors	Give GPT extra powers (search web, run code, read files)	Code interpreter, Web search, File search, MCP	⭐⭐⭐
6	Realtime / Voice	Low-latency chat & voice agents	Realtime API, Voice agents	⭐⭐⭐⭐
7	Agents SDK	Multi-step assistants that choose tools	Agents SDK (Py/TS), Assistants API	⭐⭐⭐⭐
8	Specialised Models	Embeddings for search, Moderation for safety, TTS/STT, Image gen	Embeddings, Moderation, Image gen, Speech APIs	⭐⭐⭐⭐
9	Optimisation	Fine-tune, run evals & graders, cut cost/lag	Fine-tuning, Evals, Graders, Opt. cycle	⭐⭐⭐⭐⭐
10	Coding Agents / Deep	AI that writes & runs code, advanced research	Codex cloud/CLI/IDE, Local shell, Deep research	🌟 ultra

1.2 Roles & Must-Read Mapping

#	👔 Role	🛠️ Daily Focus	📑 Must-Read Topics (§1.1)	🤖-Stars
1	Backend / API Integrator	Call API, retry, log cost	1 + 3	⭐
2	Front- / Full-Stack Dev	Add chat / image / speech widgets	1 + 2 + 3	⭐⭐
3	DevOps / Platform Eng.	Deploy & monitor GPT workloads	3 + 9	⭐⭐
4	Prompt / Workflow Eng.	Write prompts & JSON schemas	4	⭐⭐⭐
5	AI Integration Eng.	Chain tools (file/web/search)	3 + 5	⭐⭐⭐
6	Conversational / Voice Dev	Build chat or voice assistants	3 + 6 + 7	⭐⭐⭐⭐
7	RAG / Embedding Eng.	Retrieval-Augmented Generation pipelines	5 + 8	⭐⭐⭐⭐
8	ML / Fine-Tuning Eng.	Fine-tune, eval, cost/latency tune	8 + 9	⭐⭐⭐⭐⭐
9	Agent / Coding-Agent Eng.	Multi-tool autonomous agents, AI coding tools	5 + 7 + 10	⭐⭐⭐⭐⭐
10	AI Research Eng.	Novel model/agent research & safety	8 + 9 + 10	🌟 ultra

1.3 Quick Start by Role

👥 Role	💻 Language(s)	🔧 3-Step Quick Start	📋 Pre-reqs	🎯 First Mini-Project
Front-end Dev	JS/TS (UI) + small Node/Python proxy	1️⃣ `npm i openai` on server 2️⃣ Create `/api/chat` endpoint 3️⃣ Stream to React/Vue	JS fetch, env vars	Live chat widget w/ streaming
Back-end Dev	Python or Node (pick one)	1️⃣ `pip install openai`⬆️ or ⬇️ `npm i openai` 2️⃣ Choose model, log tokens 3️⃣ Build `/generate` route	REST basics, env vars	"Summarise PDF" API using File inputs
Full-stack Dev	Python or JS/TS (both not required)	1️⃣ Finish Quickstart 2️⃣ Add Function Calling 3️⃣ Add Embeddings + File retrieval	Same as above + small DB	"Chat with Docs" (upload → embed → answer)

Language FAQ

☑️ Either Python or JS/TS is fine.
☑️ Use what your stack already uses; SDKs exist for both.
🔒 Never expose API keys in browser code — hit a backend route instead.

1.4 Prerequisites & Next Steps

📏 Area	✅ Must-Know	🍃 Nice-to-Have	⏭️ Next / Latest	Why Care?
API Basics	HTTP, JSON, env-vars	Streaming (SSE)	Responses API patterns	Clean, future-proof calls
Backend Ops	Error handling, retries	Queues, webhooks	Conv. state, cost logging	Reliability & spend control
Data / RAG	Vector concept, a vector DB	Chunking, metadata	File search & connectors	Grounded, factual answers
Prompting	System vs. user prompts	Few-shot examples	Reasoning chains, JSON schemas	Accuracy & parseability
Quality / Safety	Moderation endpoint	Eval sets, graders	Fine-tuning, optimisation cycle	Safer, measurable outputs
Realtime / Voice	WebSocket/WebRTC basics	Audio codecs	Realtime API + Voice agents	Low-latency voice apps

1.5 Climb the Ladder (Week-by-Week)

⭐ Week 1 — Quickstart call in your main language.
⭐⭐ Week 2 — Add streaming + basic UI.
⭐⭐⭐ Week 3 — Use Function Calling + JSON; start prompt tuning.
⭐⭐⭐⭐ Week 4-5 — Add Tools (file/web) or Agents SDK.
⭐⭐⭐⭐⭐ Beyond — Fine-tune, run evals, or build coding agents.

No ML PhD needed until step 5. Pick one language, follow the docs above, and you're off! 🌐✨

Part 2 — AI Context: MCP, RAG, Tools & Architecture

Understanding the "working memory" of AI and how external data flows into the model.

2.1 Component Definitions

Component	What It Is	Key Features	Example
Context	All information used by the LLM during generation	• Chat, user input, RAG, tool results • Bounded by token limit • Temporary session memory	"My name is Raj" remembered during session
MCP	Model Context Protocol — open-source protocol for LLM ↔ system interaction	• JSON-RPC 2.0 spec • 1,000+ MCP servers by early 2025 • Standardizes tool execution, resource access	Claude or GPT calls company CRM via MCP server
RAG	Retrieval-Augmented Generation — combines semantic search with LLM output	• Embeds user query • Searches vector DB • Injects relevant docs into context	LLM retrieves legal cases → summarizes
Tools	External APIs or code the LLM can run	• Accessed via MCP or native tool APIs (like OpenAI's function calling) • Enables live queries, code, search	`getWeather("Hyderabad")` fetches live data

2.2 System Architecture

┌─────────────────────────────────────────────────────────────┐
│                       CONTEXT WINDOW                        │
│  ┌────────────┐  ┌─────────────┐  ┌──────────────┐          │
│  │ User Input │  │ Retrieved   │  │ Tool Results │          │
│  │ & History  │  │ Documents   │  │ (Live Data)  │          │
│  └────────────┘  └─────────────┘  └──────────────┘          │
└─────────────────────────────────────────────────────────────┘
                              ▲
                              │ Context feeds the model
                              ▼
                    ┌─────────────────────┐
                    │        LLM          │
                    │ (Claude / GPT /     │
                    │  Gemini / Mistral)  │
                    └─────────────────────┘
                              │
              ┌───────────────┼───────────────┐
              ▼               ▼               ▼
       ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
       │     MCP     │ │     RAG     │ │ Native APIs │
       │ (Protocol)  │ │ (Retrieval) │ │ / Services  │
       └──────┬──────┘ └──────┬──────┘ └──────┬──────┘
              ▼               ▼               ▼
     ┌─────────────┐  ┌──────────────┐ ┌──────────────┐
     │ MCP Servers │  │ Vector DBs   │ │ External APIs│
     │ • Tools     │  │ • Pinecone   │ │ • Weather    │
     │ • Resources │  │ • Chroma     │ │ • Search     │
     │ • Prompts   │  │ • FAISS      │ │ • Code Exec  │
     └─────────────┘  └──────────────┘ └──────────────┘

2.3 Component Relationships

Component	Feeds Into	Purpose
Context	LLM	Holds all runtime inputs
MCP	Context via tool results	Standardized tool & data access
RAG	Context via retrieved docs	Domain-specific semantic enrichment
Tools	Context via live results	Real-time functionality (e.g., code, APIs)

2.4 Industry Adoption

Provider	Status
Anthropic	Creator & lead maintainer of MCP
OpenAI	Native function calling API; MCP support via community
Google	Function calling capabilities in Gemini API
Microsoft	MCP integrated into Azure OpenAI Studio and Foundry (Preview)

Key Note: While MCP is gaining adoption, each provider also maintains their own tool calling mechanisms (like OpenAI's function calling API) alongside MCP support.

Part 3 — Mastering Tokens & Context for Programming

The "working memory" that powers tools like GPT, Claude, or Gemini for coding tasks. Measured in tokens (≈ 4 characters or 0.75 words), context includes your prompt, code snippets, configs, and the AI's output. It fits within a model's context window (e.g., 8K for basic models, up to 2M+ for advanced ones like Gemini 1.5).

For "decent" company-level programming (reliable, production-ready code), aim for 16K–64K+ tokens minimum, reserving 20–30% for output to prevent truncation. Verbose languages like C++ consume more tokens, while large projects require strategies like RAG or chunking. Use tools like Tiktoken for precise counts. Variations: ±15–20% based on code style.

3.1 Minimum Context for Company Programming

Task Type	Min Input Tokens	Typical Files/Scope	Why Needed / If Insufficient
Tiny Bug Fix	1K–4K	1–3 files + errors/tests	Local fix; wrong diagnosis, breaks code
Small Feature	4K–12K	3–8 files + deps/interfaces	Integration; duplicates, integration errors
Cross-File Refactor	12K–32K	8–20 files + usages/style	Consistency; inconsistencies, broken deps
New Module/Service	16K–64K+	10–30 files + arch/templates	Structure; poor architecture, mismatches

Overall minimum for decent work: 16K–32K tokens. Reserve 20–30% for output; use RAG for overflow.

3.2 Token Density by Language

Language	Tokens/100 LOC (Approx.)	Rank (Most Tokens)	Why Heavy/Light
C++	650–850	1 (Most)	Templates, headers, symbols
Java/C#	550–750	2	Boilerplate, OOP patterns
Rust	500–650	3	Lifetimes, macros
TypeScript	480–600	4	Type annotations
JavaScript	420–520	5	Symbols, callbacks
Go	400–480	6	Concise, explicit errors
Python	380–450	7 (Least)	Minimal syntax, no braces

Up to 2× difference (e.g., Python vs. C++). Rule of thumb: LOC × 4–8 tokens (code) or words × 1.33 (docs).

3.3 Tokens per LOC, Words & Config Files

Metric / File Type	Token Range / Formula	Example / Strategy
Per LOC (Code)	LOC × 4–8 (chars/line ÷ 4)	100 LOC Python: 380–450; Java: 580–750
Per Words (Docs/Prose)	Words × 1.33	1K words ≈ 1.3K tokens; summarize long docs
Config (.env, 20–50 lines)	100–300	Include fully; low cost, essential for env
Constants/Enums (100–500 lines)	800–4K	Summarize lists; use placeholders
package.json/pyproject (50–200 lines)	500–2K	Trim metadata/scripts; keep deps
tsconfig/eslint (100 lines)	350–700	Retain rules; critical for style
YAML/K8s/OpenAPI (100–300 lines)	700–1.2K	Extract relevant; remove comments

Configs add 10–20% to total; trim unused sections to save 20–40% tokens.

3.4 Project Size Classifications

Size	LOC Range	Tokens (Python)	Tokens (Java/C++)	Strategy
Small	<10K	8K–80K	10K–120K	Full fit in 32K–128K window
Medium	10K–100K	80K–800K	150K–1.5M	Selective files + chunking
Large	100K–1M	800K–8M	1.5M–15M	RAG mandatory for modules
Mega	>1M	>8M	>15M	Advanced RAG; impossible full fit

3.5 Real-World Projects vs Tokens

Project / Company	Est. LOC	Est. Tokens	AI Strategy
Small Startup / MVP	5K–50K	40K–500K	Full context in 128K
React / Django App	200K–500K	1.5M–3M	Chunk by features/modules
Medium SaaS / VS Code	100K–2M	700K–15M	Isolate components with RAG
Netflix Service	500K–2M	7.5M–30M	Microservice focus + summaries
Linux Kernel	30M+	200M+	Subsystem-only; heavy RAG
Google Monorepo / Search	60M–2B+	900M–15B+	Modular RAG on strict subsets

Mega projects require breaking into micro-tasks; full context is impossible even in 2M windows.

3.6 Consequences of Insufficient Tokens

Issue	Symptoms	Why / Impact	Mitigation
Wrong Results	Hallucinated APIs/logic/errors	Missing defs/specs; runtime fails	Include interfaces/tests
Duplicated Code	Rewrites existing utils	Omitted helpers; increases debt	Provide examples/utils
Inconsistent Outputs	Style mismatches/truncation	No configs/headroom; review needed	Add style guides; reserve 20–30%
Insecure/Incomplete	Missing validation/security	Absent patterns; vulnerabilities	Include arch/docs

3.7 Bug vs Feature vs New Project

Aspect	Bug Fix	Feature Addition	New Project
Input Tokens	1K–4K	4K–16K	16K–64K+
Output Tokens	200–1K	1K–6K	2K–15K
File Scope	1–3 + logs/tests	3–10 + specs/examples	10–50 + arch/templates
Priority	Errors → Code → Tests	Interfaces → Specs	Arch → Patterns
Failure Mode	Wrong diagnosis	Integration errors	Poor structure
Output/Input Ratio	1:3–5	1:2–3	1:1–2

3.8 Input vs Output Tokens & Pricing

Aspect / Task	Typical Ratio (Input:Output)	Input Example	Output Example
Bug Fix	3–5:1	3K	1K
Feature	2–3:1	6K	2K
New Project	1–2:1	10K	5K

Model (2025 Rates)	Max Window	Input $/1M	Output $/1M	Cost for 10K In + 2K Out
GPT-4o Mini	128K	$0.15-0.50	$0.60–1.50	~$0.002–0.008
Claude 3.5 / GPT-4o	128K–200K	$3-5	$15	~$0.06–0.08
Gemini 1.5 / GPT-4	1M–2M	$5-10	$15–30	~$0.08–0.14

Inputs: 70–80% of total, cheaper (1×). Outputs: 20–30%, pricier (2–3×). Formula: (Input × Rate + Output × Rate) / 1M.

3.9 Screenshot/Image Impact on Context

Image Type	Token Equivalent	Context Change	Guidance
Code/Error Screen	500–3K	+2–3× vs. text (OCR bloat)	Avoid; copy-paste text for accuracy/savings
UI Mockup	1K–5K	Moderate noise/adds visual	Useful for layout; crop tightly
Arch Diagram	2K–10K	High value, but costly	Worth it; add text descriptions

Images inflate input (2–5× text cost); pair with raw text for bugs to minimize waste.

3.10 Decision Matrix & Pro Tips

Pro Strategies:

Prioritize relevance (e.g., interfaces/tests); trim comments (save 20–40%).
Use RAG for large repos; measure tokens via OpenAI's Tiktoken tool.
When tokens aren't enough: Split tasks, summarize, or upgrade models (e.g., from 128K to 2M windows).

Task	Start Tokens	Upgrade If	Use RAG When
Bug Fix	4K	Complex logic	>5 files
Feature	16K	Cross-module	>10 files
Refactor	32K	High risk	>20 files
New Project	64K	Enterprise	>100K LOC

The AI Developer's Handbook

0:00

The AI Developer's Handbook 2025

Last updated: 2025

OpenAI APIs, Context Mastery, Tokens & Tooling for Software Engineers

📖 Table of Contents

The AI Developer's Handbook 2025
- 📖 Table of Contents
Part 1 — Getting Started & API Landscape
Part 2 — AI Context: MCP, RAG, Tools & Architecture
Part 3 — Mastering Tokens & Context for Programming

Part 1 — Getting Started & API Landscape

All factual • SWE-I friendly • sorted from least 🤖 to most 🤖🤖🤖 work

1.1 Docs Topics (Least AI → Most AI)

⭐ Rank	📂 Topic Group	🎯 SWE-Simple Meaning	💫 Key Sub-Topics	🤖 AI-Level
1	Getting Started	First API call, set key, pick model, check cost	Overview, Quickstart, Libraries, Pricing, Models	⭐
2	Core Features	Ask model to 📝 write, 🖼️ draw, 🗣️ talk, or return JSON	Text gen, Images & vision, Audio, Structured output	⭐⭐
3	Run & Scale	Make it real-time, stream tokens, store chat, handle webhooks	Streaming, Conv. state, Webhooks, Background mode	⭐⭐
4	Prompting & Reasoning	Craft better instructions / schemas	Prompting guides, Reasoning tips	⭐⭐⭐
5	Tools & Connectors	Give GPT extra powers (search web, run code, read files)	Code interpreter, Web search, File search, MCP	⭐⭐⭐
6	Realtime / Voice	Low-latency chat & voice agents	Realtime API, Voice agents	⭐⭐⭐⭐
7	Agents SDK	Multi-step assistants that choose tools	Agents SDK (Py/TS), Assistants API	⭐⭐⭐⭐
8	Specialised Models	Embeddings for search, Moderation for safety, TTS/STT, Image gen	Embeddings, Moderation, Image gen, Speech APIs	⭐⭐⭐⭐
9	Optimisation	Fine-tune, run evals & graders, cut cost/lag	Fine-tuning, Evals, Graders, Opt. cycle	⭐⭐⭐⭐⭐
10	Coding Agents / Deep	AI that writes & runs code, advanced research	Codex cloud/CLI/IDE, Local shell, Deep research	🌟 ultra

1.2 Roles & Must-Read Mapping

#	👔 Role	🛠️ Daily Focus	📑 Must-Read Topics (§1.1)	🤖-Stars
1	Backend / API Integrator	Call API, retry, log cost	1 + 3	⭐
2	Front- / Full-Stack Dev	Add chat / image / speech widgets	1 + 2 + 3	⭐⭐
3	DevOps / Platform Eng.	Deploy & monitor GPT workloads	3 + 9	⭐⭐
4	Prompt / Workflow Eng.	Write prompts & JSON schemas	4	⭐⭐⭐
5	AI Integration Eng.	Chain tools (file/web/search)	3 + 5	⭐⭐⭐
6	Conversational / Voice Dev	Build chat or voice assistants	3 + 6 + 7	⭐⭐⭐⭐
7	RAG / Embedding Eng.	Retrieval-Augmented Generation pipelines	5 + 8	⭐⭐⭐⭐
8	ML / Fine-Tuning Eng.	Fine-tune, eval, cost/latency tune	8 + 9	⭐⭐⭐⭐⭐
9	Agent / Coding-Agent Eng.	Multi-tool autonomous agents, AI coding tools	5 + 7 + 10	⭐⭐⭐⭐⭐
10	AI Research Eng.	Novel model/agent research & safety	8 + 9 + 10	🌟 ultra

1.3 Quick Start by Role

👥 Role	💻 Language(s)	🔧 3-Step Quick Start	📋 Pre-reqs	🎯 First Mini-Project
Front-end Dev	JS/TS (UI) + small Node/Python proxy	1️⃣ `npm i openai` on server 2️⃣ Create `/api/chat` endpoint 3️⃣ Stream to React/Vue	JS fetch, env vars	Live chat widget w/ streaming
Back-end Dev	Python or Node (pick one)	1️⃣ `pip install openai`⬆️ or ⬇️ `npm i openai` 2️⃣ Choose model, log tokens 3️⃣ Build `/generate` route	REST basics, env vars	"Summarise PDF" API using File inputs
Full-stack Dev	Python or JS/TS (both not required)	1️⃣ Finish Quickstart 2️⃣ Add Function Calling 3️⃣ Add Embeddings + File retrieval	Same as above + small DB	"Chat with Docs" (upload → embed → answer)

Language FAQ

☑️ Either Python or JS/TS is fine.
☑️ Use what your stack already uses; SDKs exist for both.
🔒 Never expose API keys in browser code — hit a backend route instead.

1.4 Prerequisites & Next Steps

📏 Area	✅ Must-Know	🍃 Nice-to-Have	⏭️ Next / Latest	Why Care?
API Basics	HTTP, JSON, env-vars	Streaming (SSE)	Responses API patterns	Clean, future-proof calls
Backend Ops	Error handling, retries	Queues, webhooks	Conv. state, cost logging	Reliability & spend control
Data / RAG	Vector concept, a vector DB	Chunking, metadata	File search & connectors	Grounded, factual answers
Prompting	System vs. user prompts	Few-shot examples	Reasoning chains, JSON schemas	Accuracy & parseability
Quality / Safety	Moderation endpoint	Eval sets, graders	Fine-tuning, optimisation cycle	Safer, measurable outputs
Realtime / Voice	WebSocket/WebRTC basics	Audio codecs	Realtime API + Voice agents	Low-latency voice apps

1.5 Climb the Ladder (Week-by-Week)

⭐ Week 1 — Quickstart call in your main language.
⭐⭐ Week 2 — Add streaming + basic UI.
⭐⭐⭐ Week 3 — Use Function Calling + JSON; start prompt tuning.
⭐⭐⭐⭐ Week 4-5 — Add Tools (file/web) or Agents SDK.
⭐⭐⭐⭐⭐ Beyond — Fine-tune, run evals, or build coding agents.

No ML PhD needed until step 5. Pick one language, follow the docs above, and you're off! 🌐✨

Part 2 — AI Context: MCP, RAG, Tools & Architecture

Understanding the "working memory" of AI and how external data flows into the model.

2.1 Component Definitions

Component	What It Is	Key Features	Example
Context	All information used by the LLM during generation	• Chat, user input, RAG, tool results • Bounded by token limit • Temporary session memory	"My name is Raj" remembered during session
MCP	Model Context Protocol — open-source protocol for LLM ↔ system interaction	• JSON-RPC 2.0 spec • 1,000+ MCP servers by early 2025 • Standardizes tool execution, resource access	Claude or GPT calls company CRM via MCP server
RAG	Retrieval-Augmented Generation — combines semantic search with LLM output	• Embeds user query • Searches vector DB • Injects relevant docs into context	LLM retrieves legal cases → summarizes
Tools	External APIs or code the LLM can run	• Accessed via MCP or native tool APIs (like OpenAI's function calling) • Enables live queries, code, search	`getWeather("Hyderabad")` fetches live data

2.2 System Architecture

┌─────────────────────────────────────────────────────────────┐
│                       CONTEXT WINDOW                        │
│  ┌────────────┐  ┌─────────────┐  ┌──────────────┐          │
│  │ User Input │  │ Retrieved   │  │ Tool Results │          │
│  │ & History  │  │ Documents   │  │ (Live Data)  │          │
│  └────────────┘  └─────────────┘  └──────────────┘          │
└─────────────────────────────────────────────────────────────┘
                              ▲
                              │ Context feeds the model
                              ▼
                    ┌─────────────────────┐
                    │        LLM          │
                    │ (Claude / GPT /     │
                    │  Gemini / Mistral)  │
                    └─────────────────────┘
                              │
              ┌───────────────┼───────────────┐
              ▼               ▼               ▼
       ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
       │     MCP     │ │     RAG     │ │ Native APIs │
       │ (Protocol)  │ │ (Retrieval) │ │ / Services  │
       └──────┬──────┘ └──────┬──────┘ └──────┬──────┘
              ▼               ▼               ▼
     ┌─────────────┐  ┌──────────────┐ ┌──────────────┐
     │ MCP Servers │  │ Vector DBs   │ │ External APIs│
     │ • Tools     │  │ • Pinecone   │ │ • Weather    │
     │ • Resources │  │ • Chroma     │ │ • Search     │
     │ • Prompts   │  │ • FAISS      │ │ • Code Exec  │
     └─────────────┘  └──────────────┘ └──────────────┘

2.3 Component Relationships

Component	Feeds Into	Purpose
Context	LLM	Holds all runtime inputs
MCP	Context via tool results	Standardized tool & data access
RAG	Context via retrieved docs	Domain-specific semantic enrichment
Tools	Context via live results	Real-time functionality (e.g., code, APIs)

2.4 Industry Adoption

Provider	Status
Anthropic	Creator & lead maintainer of MCP
OpenAI	Native function calling API; MCP support via community
Google	Function calling capabilities in Gemini API
Microsoft	MCP integrated into Azure OpenAI Studio and Foundry (Preview)

Key Note: While MCP is gaining adoption, each provider also maintains their own tool calling mechanisms (like OpenAI's function calling API) alongside MCP support.

Part 3 — Mastering Tokens & Context for Programming

The "working memory" that powers tools like GPT, Claude, or Gemini for coding tasks. Measured in tokens (≈ 4 characters or 0.75 words), context includes your prompt, code snippets, configs, and the AI's output. It fits within a model's context window (e.g., 8K for basic models, up to 2M+ for advanced ones like Gemini 1.5).

For "decent" company-level programming (reliable, production-ready code), aim for 16K–64K+ tokens minimum, reserving 20–30% for output to prevent truncation. Verbose languages like C++ consume more tokens, while large projects require strategies like RAG or chunking. Use tools like Tiktoken for precise counts. Variations: ±15–20% based on code style.

3.1 Minimum Context for Company Programming

Task Type	Min Input Tokens	Typical Files/Scope	Why Needed / If Insufficient
Tiny Bug Fix	1K–4K	1–3 files + errors/tests	Local fix; wrong diagnosis, breaks code
Small Feature	4K–12K	3–8 files + deps/interfaces	Integration; duplicates, integration errors
Cross-File Refactor	12K–32K	8–20 files + usages/style	Consistency; inconsistencies, broken deps
New Module/Service	16K–64K+	10–30 files + arch/templates	Structure; poor architecture, mismatches

Overall minimum for decent work: 16K–32K tokens. Reserve 20–30% for output; use RAG for overflow.

3.2 Token Density by Language

Language	Tokens/100 LOC (Approx.)	Rank (Most Tokens)	Why Heavy/Light
C++	650–850	1 (Most)	Templates, headers, symbols
Java/C#	550–750	2	Boilerplate, OOP patterns
Rust	500–650	3	Lifetimes, macros
TypeScript	480–600	4	Type annotations
JavaScript	420–520	5	Symbols, callbacks
Go	400–480	6	Concise, explicit errors
Python	380–450	7 (Least)	Minimal syntax, no braces

Up to 2× difference (e.g., Python vs. C++). Rule of thumb: LOC × 4–8 tokens (code) or words × 1.33 (docs).

3.3 Tokens per LOC, Words & Config Files

Metric / File Type	Token Range / Formula	Example / Strategy
Per LOC (Code)	LOC × 4–8 (chars/line ÷ 4)	100 LOC Python: 380–450; Java: 580–750
Per Words (Docs/Prose)	Words × 1.33	1K words ≈ 1.3K tokens; summarize long docs
Config (.env, 20–50 lines)	100–300	Include fully; low cost, essential for env
Constants/Enums (100–500 lines)	800–4K	Summarize lists; use placeholders
package.json/pyproject (50–200 lines)	500–2K	Trim metadata/scripts; keep deps
tsconfig/eslint (100 lines)	350–700	Retain rules; critical for style
YAML/K8s/OpenAPI (100–300 lines)	700–1.2K	Extract relevant; remove comments

Configs add 10–20% to total; trim unused sections to save 20–40% tokens.

3.4 Project Size Classifications

Size	LOC Range	Tokens (Python)	Tokens (Java/C++)	Strategy
Small	<10K	8K–80K	10K–120K	Full fit in 32K–128K window
Medium	10K–100K	80K–800K	150K–1.5M	Selective files + chunking
Large	100K–1M	800K–8M	1.5M–15M	RAG mandatory for modules
Mega	>1M	>8M	>15M	Advanced RAG; impossible full fit

3.5 Real-World Projects vs Tokens

Project / Company	Est. LOC	Est. Tokens	AI Strategy
Small Startup / MVP	5K–50K	40K–500K	Full context in 128K
React / Django App	200K–500K	1.5M–3M	Chunk by features/modules
Medium SaaS / VS Code	100K–2M	700K–15M	Isolate components with RAG
Netflix Service	500K–2M	7.5M–30M	Microservice focus + summaries
Linux Kernel	30M+	200M+	Subsystem-only; heavy RAG
Google Monorepo / Search	60M–2B+	900M–15B+	Modular RAG on strict subsets

Mega projects require breaking into micro-tasks; full context is impossible even in 2M windows.

3.6 Consequences of Insufficient Tokens

Issue	Symptoms	Why / Impact	Mitigation
Wrong Results	Hallucinated APIs/logic/errors	Missing defs/specs; runtime fails	Include interfaces/tests
Duplicated Code	Rewrites existing utils	Omitted helpers; increases debt	Provide examples/utils
Inconsistent Outputs	Style mismatches/truncation	No configs/headroom; review needed	Add style guides; reserve 20–30%
Insecure/Incomplete	Missing validation/security	Absent patterns; vulnerabilities	Include arch/docs

3.7 Bug vs Feature vs New Project

Aspect	Bug Fix	Feature Addition	New Project
Input Tokens	1K–4K	4K–16K	16K–64K+
Output Tokens	200–1K	1K–6K	2K–15K
File Scope	1–3 + logs/tests	3–10 + specs/examples	10–50 + arch/templates
Priority	Errors → Code → Tests	Interfaces → Specs	Arch → Patterns
Failure Mode	Wrong diagnosis	Integration errors	Poor structure
Output/Input Ratio	1:3–5	1:2–3	1:1–2

3.8 Input vs Output Tokens & Pricing

Aspect / Task	Typical Ratio (Input:Output)	Input Example	Output Example
Bug Fix	3–5:1	3K	1K
Feature	2–3:1	6K	2K
New Project	1–2:1	10K	5K

Model (2025 Rates)	Max Window	Input $/1M	Output $/1M	Cost for 10K In + 2K Out
GPT-4o Mini	128K	$0.15-0.50	$0.60–1.50	~$0.002–0.008
Claude 3.5 / GPT-4o	128K–200K	$3-5	$15	~$0.06–0.08
Gemini 1.5 / GPT-4	1M–2M	$5-10	$15–30	~$0.08–0.14

Inputs: 70–80% of total, cheaper (1×). Outputs: 20–30%, pricier (2–3×). Formula: (Input × Rate + Output × Rate) / 1M.

3.9 Screenshot/Image Impact on Context

Image Type	Token Equivalent	Context Change	Guidance
Code/Error Screen	500–3K	+2–3× vs. text (OCR bloat)	Avoid; copy-paste text for accuracy/savings
UI Mockup	1K–5K	Moderate noise/adds visual	Useful for layout; crop tightly
Arch Diagram	2K–10K	High value, but costly	Worth it; add text descriptions

Images inflate input (2–5× text cost); pair with raw text for bugs to minimize waste.

3.10 Decision Matrix & Pro Tips

Pro Strategies:

Prioritize relevance (e.g., interfaces/tests); trim comments (save 20–40%).
Use RAG for large repos; measure tokens via OpenAI's Tiktoken tool.
When tokens aren't enough: Split tasks, summarize, or upgrade models (e.g., from 128K to 2M windows).

Task	Start Tokens	Upgrade If	Use RAG When
Bug Fix	4K	Complex logic	>5 files
Feature	16K	Cross-module	>10 files
Refactor	32K	High risk	>20 files
New Project	64K	Enterprise	>100K LOC

The AI Developer's Handbook 2025

📖 Table of Contents

Part 1 — Getting Started & API Landscape

1.1 Docs Topics (Least AI → Most AI)

1.2 Roles & Must-Read Mapping

1.3 Quick Start by Role

1.4 Prerequisites & Next Steps

1.5 Climb the Ladder (Week-by-Week)

Part 2 — AI Context: MCP, RAG, Tools & Architecture

2.1 Component Definitions

2.2 System Architecture

2.3 Component Relationships

2.4 Industry Adoption

Part 3 — Mastering Tokens & Context for Programming

3.1 Minimum Context for Company Programming

3.2 Token Density by Language

3.3 Tokens per LOC, Words & Config Files

3.4 Project Size Classifications

3.5 Real-World Projects vs Tokens

3.6 Consequences of Insufficient Tokens

3.7 Bug vs Feature vs New Project

3.8 Input vs Output Tokens & Pricing

3.9 Screenshot/Image Impact on Context

3.10 Decision Matrix & Pro Tips

🔍 Explore More Topics

Best AI Coding Tools (February 2026)

AI Pricing Guide: Subscriptions, API Costs & Free Options (2025–2026)

State of LLMs: The Complete Guide (2025–2026)

The Complete AI Guide — From Fundamentals to Future

AI/ML: Embeddings Explained

AI/ML Concepts: DE, DS, AI/ML Explained

The AI Developer's Handbook 2025

📖 Table of Contents

Part 1 — Getting Started & API Landscape

1.1 Docs Topics (Least AI → Most AI)

1.2 Roles & Must-Read Mapping

1.3 Quick Start by Role

1.4 Prerequisites & Next Steps

1.5 Climb the Ladder (Week-by-Week)

Part 2 — AI Context: MCP, RAG, Tools & Architecture

2.1 Component Definitions

2.2 System Architecture

2.3 Component Relationships

2.4 Industry Adoption

Part 3 — Mastering Tokens & Context for Programming

3.1 Minimum Context for Company Programming

3.2 Token Density by Language

3.3 Tokens per LOC, Words & Config Files

3.4 Project Size Classifications

3.5 Real-World Projects vs Tokens

3.6 Consequences of Insufficient Tokens

3.7 Bug vs Feature vs New Project

3.8 Input vs Output Tokens & Pricing

3.9 Screenshot/Image Impact on Context

3.10 Decision Matrix & Pro Tips