The AI Developer's Handbook
The AI Developer's Handbook 2025
Last updated: 2025
OpenAI APIs, Context Mastery, Tokens & Tooling for Software Engineers
π Table of Contents
- The AI Developer's Handbook 2025
- Part 1 β Getting Started & API Landscape
- Part 2 β AI Context: MCP, RAG, Tools & Architecture
- Part 3 β Mastering Tokens & Context for Programming
- 3.1 Minimum Context for Company Programming
- 3.2 Token Density by Language
- 3.3 Tokens per LOC, Words & Config Files
- 3.4 Project Size Classifications
- 3.5 Real-World Projects vs Tokens
- 3.6 Consequences of Insufficient Tokens
- 3.7 Bug vs Feature vs New Project
- 3.8 Input vs Output Tokens & Pricing
- 3.9 Screenshot/Image Impact on Context
- 3.10 Decision Matrix & Pro Tips
Part 1 β Getting Started & API Landscape
All factual β’ SWE-I friendly β’ sorted from least π€ to most π€π€π€ work
1.1 Docs Topics (Least AI β Most AI)
| β Rank | π Topic Group | π― SWE-Simple Meaning | π« Key Sub-Topics | π€ AI-Level |
|---|---|---|---|---|
| 1 | Getting Started | First API call, set key, pick model, check cost | Overview, Quickstart, Libraries, Pricing, Models | β |
| 2 | Core Features | Ask model to π write, πΌοΈ draw, π£οΈ talk, or return JSON | Text gen, Images & vision, Audio, Structured output | ββ |
| 3 | Run & Scale | Make it real-time, stream tokens, store chat, handle webhooks | Streaming, Conv. state, Webhooks, Background mode | ββ |
| 4 | Prompting & Reasoning | Craft better instructions / schemas | Prompting guides, Reasoning tips | βββ |
| 5 | Tools & Connectors | Give GPT extra powers (search web, run code, read files) | Code interpreter, Web search, File search, MCP | βββ |
| 6 | Realtime / Voice | Low-latency chat & voice agents | Realtime API, Voice agents | ββββ |
| 7 | Agents SDK | Multi-step assistants that choose tools | Agents SDK (Py/TS), Assistants API | ββββ |
| 8 | Specialised Models | Embeddings for search, Moderation for safety, TTS/STT, Image gen | Embeddings, Moderation, Image gen, Speech APIs | ββββ |
| 9 | Optimisation | Fine-tune, run evals & graders, cut cost/lag | Fine-tuning, Evals, Graders, Opt. cycle | βββββ |
| 10 | Coding Agents / Deep | AI that writes & runs code, advanced research | Codex cloud/CLI/IDE, Local shell, Deep research | π ultra |
1.2 Roles & Must-Read Mapping
| # | π Role | π οΈ Daily Focus | π Must-Read Topics (Β§1.1) | π€-Stars |
|---|---|---|---|---|
| 1 | Backend / API Integrator | Call API, retry, log cost | 1 + 3 | β |
| 2 | Front- / Full-Stack Dev | Add chat / image / speech widgets | 1 + 2 + 3 | ββ |
| 3 | DevOps / Platform Eng. | Deploy & monitor GPT workloads | 3 + 9 | ββ |
| 4 | Prompt / Workflow Eng. | Write prompts & JSON schemas | 4 | βββ |
| 5 | AI Integration Eng. | Chain tools (file/web/search) | 3 + 5 | βββ |
| 6 | Conversational / Voice Dev | Build chat or voice assistants | 3 + 6 + 7 | ββββ |
| 7 | RAG / Embedding Eng. | Retrieval-Augmented Generation pipelines | 5 + 8 | ββββ |
| 8 | ML / Fine-Tuning Eng. | Fine-tune, eval, cost/latency tune | 8 + 9 | βββββ |
| 9 | Agent / Coding-Agent Eng. | Multi-tool autonomous agents, AI coding tools | 5 + 7 + 10 | βββββ |
| 10 | AI Research Eng. | Novel model/agent research & safety | 8 + 9 + 10 | π ultra |
1.3 Quick Start by Role
| π₯ Role | π» Language(s) | π§ 3-Step Quick Start | π Pre-reqs | π― First Mini-Project |
|---|---|---|---|---|
| Front-end Dev | JS/TS (UI) + small Node/Python proxy | 1οΈβ£ npm i openai on server 2οΈβ£ Create /api/chat endpoint 3οΈβ£ Stream to React/Vue | JS fetch, env vars | Live chat widget w/ streaming |
| Back-end Dev | Python or Node (pick one) | 1οΈβ£ pip install openaiβ¬οΈ or β¬οΈ npm i openai 2οΈβ£ Choose model, log tokens 3οΈβ£ Build /generate route | REST basics, env vars | "Summarise PDF" API using File inputs |
| Full-stack Dev | Python or JS/TS (both not required) | 1οΈβ£ Finish Quickstart 2οΈβ£ Add Function Calling 3οΈβ£ Add Embeddings + File retrieval | Same as above + small DB | "Chat with Docs" (upload β embed β answer) |
Language FAQ
- βοΈ Either Python or JS/TS is fine.
- βοΈ Use what your stack already uses; SDKs exist for both.
- π Never expose API keys in browser code β hit a backend route instead.
1.4 Prerequisites & Next Steps
| π Area | β Must-Know | π Nice-to-Have | βοΈ Next / Latest | Why Care? |
|---|---|---|---|---|
| API Basics | HTTP, JSON, env-vars | Streaming (SSE) | Responses API patterns | Clean, future-proof calls |
| Backend Ops | Error handling, retries | Queues, webhooks | Conv. state, cost logging | Reliability & spend control |
| Data / RAG | Vector concept, a vector DB | Chunking, metadata | File search & connectors | Grounded, factual answers |
| Prompting | System vs. user prompts | Few-shot examples | Reasoning chains, JSON schemas | Accuracy & parseability |
| Quality / Safety | Moderation endpoint | Eval sets, graders | Fine-tuning, optimisation cycle | Safer, measurable outputs |
| Realtime / Voice | WebSocket/WebRTC basics | Audio codecs | Realtime API + Voice agents | Low-latency voice apps |
1.5 Climb the Ladder (Week-by-Week)
- β Week 1 β Quickstart call in your main language.
- ββ Week 2 β Add streaming + basic UI.
- βββ Week 3 β Use Function Calling + JSON; start prompt tuning.
- ββββ Week 4-5 β Add Tools (file/web) or Agents SDK.
- βββββ Beyond β Fine-tune, run evals, or build coding agents.
No ML PhD needed until step 5. Pick one language, follow the docs above, and you're off! πβ¨
Part 2 β AI Context: MCP, RAG, Tools & Architecture
Understanding the "working memory" of AI and how external data flows into the model.
2.1 Component Definitions
| Component | What It Is | Key Features | Example |
|---|---|---|---|
| Context | All information used by the LLM during generation | β’ Chat, user input, RAG, tool results β’ Bounded by token limit β’ Temporary session memory | "My name is Raj" remembered during session |
| MCP | Model Context Protocol β open-source protocol for LLM β system interaction | β’ JSON-RPC 2.0 spec β’ 1,000+ MCP servers by early 2025 β’ Standardizes tool execution, resource access | Claude or GPT calls company CRM via MCP server |
| RAG | Retrieval-Augmented Generation β combines semantic search with LLM output | β’ Embeds user query β’ Searches vector DB β’ Injects relevant docs into context | LLM retrieves legal cases β summarizes |
| Tools | External APIs or code the LLM can run | β’ Accessed via MCP or native tool APIs (like OpenAI's function calling) β’ Enables live queries, code, search | getWeather("Hyderabad") fetches live data |
2.2 System Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CONTEXT WINDOW β
β ββββββββββββββ βββββββββββββββ ββββββββββββββββ β
β β User Input β β Retrieved β β Tool Results β β
β β & History β β Documents β β (Live Data) β β
β ββββββββββββββ βββββββββββββββ ββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β²
β Context feeds the model
βΌ
βββββββββββββββββββββββ
β LLM β
β (Claude / GPT / β
β Gemini / Mistral) β
βββββββββββββββββββββββ
β
βββββββββββββββββΌββββββββββββββββ
βΌ βΌ βΌ
βββββββββββββββ βββββββββββββββ βββββββββββββββ
β MCP β β RAG β β Native APIs β
β (Protocol) β β (Retrieval) β β / Services β
ββββββββ¬βββββββ ββββββββ¬βββββββ ββββββββ¬βββββββ
βΌ βΌ βΌ
βββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β MCP Servers β β Vector DBs β β External APIsβ
β β’ Tools β β β’ Pinecone β β β’ Weather β
β β’ Resources β β β’ Chroma β β β’ Search β
β β’ Prompts β β β’ FAISS β β β’ Code Exec β
βββββββββββββββ ββββββββββββββββ ββββββββββββββββ
2.3 Component Relationships
| Component | Feeds Into | Purpose |
|---|---|---|
| Context | LLM | Holds all runtime inputs |
| MCP | Context via tool results | Standardized tool & data access |
| RAG | Context via retrieved docs | Domain-specific semantic enrichment |
| Tools | Context via live results | Real-time functionality (e.g., code, APIs) |
2.4 Industry Adoption
| Provider | Status |
|---|---|
| Anthropic | Creator & lead maintainer of MCP |
| OpenAI | Native function calling API; MCP support via community |
| Function calling capabilities in Gemini API | |
| Microsoft | MCP integrated into Azure OpenAI Studio and Foundry (Preview) |
Key Note: While MCP is gaining adoption, each provider also maintains their own tool calling mechanisms (like OpenAI's function calling API) alongside MCP support.
Part 3 β Mastering Tokens & Context for Programming
The "working memory" that powers tools like GPT, Claude, or Gemini for coding tasks. Measured in tokens (β 4 characters or 0.75 words), context includes your prompt, code snippets, configs, and the AI's output. It fits within a model's context window (e.g., 8K for basic models, up to 2M+ for advanced ones like Gemini 1.5).
For "decent" company-level programming (reliable, production-ready code), aim for 16Kβ64K+ tokens minimum, reserving 20β30% for output to prevent truncation. Verbose languages like C++ consume more tokens, while large projects require strategies like RAG or chunking. Use tools like Tiktoken for precise counts. Variations: Β±15β20% based on code style.
3.1 Minimum Context for Company Programming
| Task Type | Min Input Tokens | Typical Files/Scope | Why Needed / If Insufficient |
|---|---|---|---|
| Tiny Bug Fix | 1Kβ4K | 1β3 files + errors/tests | Local fix; wrong diagnosis, breaks code |
| Small Feature | 4Kβ12K | 3β8 files + deps/interfaces | Integration; duplicates, integration errors |
| Cross-File Refactor | 12Kβ32K | 8β20 files + usages/style | Consistency; inconsistencies, broken deps |
| New Module/Service | 16Kβ64K+ | 10β30 files + arch/templates | Structure; poor architecture, mismatches |
Overall minimum for decent work: 16Kβ32K tokens. Reserve 20β30% for output; use RAG for overflow.
3.2 Token Density by Language
| Language | Tokens/100 LOC (Approx.) | Rank (Most Tokens) | Why Heavy/Light |
|---|---|---|---|
| C++ | 650β850 | 1 (Most) | Templates, headers, symbols |
| Java/C# | 550β750 | 2 | Boilerplate, OOP patterns |
| Rust | 500β650 | 3 | Lifetimes, macros |
| TypeScript | 480β600 | 4 | Type annotations |
| JavaScript | 420β520 | 5 | Symbols, callbacks |
| Go | 400β480 | 6 | Concise, explicit errors |
| Python | 380β450 | 7 (Least) | Minimal syntax, no braces |
Up to 2Γ difference (e.g., Python vs. C++). Rule of thumb: LOC Γ 4β8 tokens (code) or words Γ 1.33 (docs).
3.3 Tokens per LOC, Words & Config Files
| Metric / File Type | Token Range / Formula | Example / Strategy |
|---|---|---|
| Per LOC (Code) | LOC Γ 4β8 (chars/line Γ· 4) | 100 LOC Python: 380β450; Java: 580β750 |
| Per Words (Docs/Prose) | Words Γ 1.33 | 1K words β 1.3K tokens; summarize long docs |
| Config (.env, 20β50 lines) | 100β300 | Include fully; low cost, essential for env |
| Constants/Enums (100β500 lines) | 800β4K | Summarize lists; use placeholders |
| package.json/pyproject (50β200 lines) | 500β2K | Trim metadata/scripts; keep deps |
| tsconfig/eslint (100 lines) | 350β700 | Retain rules; critical for style |
| YAML/K8s/OpenAPI (100β300 lines) | 700β1.2K | Extract relevant; remove comments |
Configs add 10β20% to total; trim unused sections to save 20β40% tokens.
3.4 Project Size Classifications
| Size | LOC Range | Tokens (Python) | Tokens (Java/C++) | Strategy |
|---|---|---|---|---|
| Small | <10K | 8Kβ80K | 10Kβ120K | Full fit in 32Kβ128K window |
| Medium | 10Kβ100K | 80Kβ800K | 150Kβ1.5M | Selective files + chunking |
| Large | 100Kβ1M | 800Kβ8M | 1.5Mβ15M | RAG mandatory for modules |
| Mega | >1M | >8M | >15M | Advanced RAG; impossible full fit |
3.5 Real-World Projects vs Tokens
| Project / Company | Est. LOC | Est. Tokens | AI Strategy |
|---|---|---|---|
| Small Startup / MVP | 5Kβ50K | 40Kβ500K | Full context in 128K |
| React / Django App | 200Kβ500K | 1.5Mβ3M | Chunk by features/modules |
| Medium SaaS / VS Code | 100Kβ2M | 700Kβ15M | Isolate components with RAG |
| Netflix Service | 500Kβ2M | 7.5Mβ30M | Microservice focus + summaries |
| Linux Kernel | 30M+ | 200M+ | Subsystem-only; heavy RAG |
| Google Monorepo / Search | 60Mβ2B+ | 900Mβ15B+ | Modular RAG on strict subsets |
Mega projects require breaking into micro-tasks; full context is impossible even in 2M windows.
3.6 Consequences of Insufficient Tokens
| Issue | Symptoms | Why / Impact | Mitigation |
|---|---|---|---|
| Wrong Results | Hallucinated APIs/logic/errors | Missing defs/specs; runtime fails | Include interfaces/tests |
| Duplicated Code | Rewrites existing utils | Omitted helpers; increases debt | Provide examples/utils |
| Inconsistent Outputs | Style mismatches/truncation | No configs/headroom; review needed | Add style guides; reserve 20β30% |
| Insecure/Incomplete | Missing validation/security | Absent patterns; vulnerabilities | Include arch/docs |
3.7 Bug vs Feature vs New Project
| Aspect | Bug Fix | Feature Addition | New Project |
|---|---|---|---|
| Input Tokens | 1Kβ4K | 4Kβ16K | 16Kβ64K+ |
| Output Tokens | 200β1K | 1Kβ6K | 2Kβ15K |
| File Scope | 1β3 + logs/tests | 3β10 + specs/examples | 10β50 + arch/templates |
| Priority | Errors β Code β Tests | Interfaces β Specs | Arch β Patterns |
| Failure Mode | Wrong diagnosis | Integration errors | Poor structure |
| Output/Input Ratio | 1:3β5 | 1:2β3 | 1:1β2 |
3.8 Input vs Output Tokens & Pricing
| Aspect / Task | Typical Ratio (Input:Output) | Input Example | Output Example |
|---|---|---|---|
| Bug Fix | 3β5:1 | 3K | 1K |
| Feature | 2β3:1 | 6K | 2K |
| New Project | 1β2:1 | 10K | 5K |
| Model (2025 Rates) | Max Window | Input $/1M | Output $/1M | Cost for 10K In + 2K Out |
|---|---|---|---|---|
| GPT-4o Mini | 128K | $0.15-0.50 | $0.60β1.50 | ~$0.002β0.008 |
| Claude 3.5 / GPT-4o | 128Kβ200K | $3-5 | $15 | ~$0.06β0.08 |
| Gemini 1.5 / GPT-4 | 1Mβ2M | $5-10 | $15β30 | ~$0.08β0.14 |
Inputs: 70β80% of total, cheaper (1Γ). Outputs: 20β30%, pricier (2β3Γ). Formula:
(Input Γ Rate + Output Γ Rate) / 1M.
3.9 Screenshot/Image Impact on Context
| Image Type | Token Equivalent | Context Change | Guidance |
|---|---|---|---|
| Code/Error Screen | 500β3K | +2β3Γ vs. text (OCR bloat) | Avoid; copy-paste text for accuracy/savings |
| UI Mockup | 1Kβ5K | Moderate noise/adds visual | Useful for layout; crop tightly |
| Arch Diagram | 2Kβ10K | High value, but costly | Worth it; add text descriptions |
Images inflate input (2β5Γ text cost); pair with raw text for bugs to minimize waste.
3.10 Decision Matrix & Pro Tips
Pro Strategies:
- Prioritize relevance (e.g., interfaces/tests); trim comments (save 20β40%).
- Use RAG for large repos; measure tokens via OpenAI's Tiktoken tool.
- When tokens aren't enough: Split tasks, summarize, or upgrade models (e.g., from 128K to 2M windows).
| Task | Start Tokens | Upgrade If | Use RAG When |
|---|---|---|---|
| Bug Fix | 4K | Complex logic | >5 files |
| Feature | 16K | Cross-module | >10 files |
| Refactor | 32K | High risk | >20 files |
| New Project | 64K | Enterprise | >100K LOC |