The Signal
The AI stories that actually matter — and the tools worth your time
This Week in AI — What Actually Matters
Claude 4 Lands — and Agentic AI Finally Starts to Deliver
Anthropic shipped Claude 4 Sonnet and Opus with dramatically improved agentic architecture. The real story isn’t benchmarks — it’s that multi-step tool use is now reliable enough for production pipelines. Claude Sonnet at $3/M input tokens is the most compelling value play in frontier AI right now. The API is clean, the documentation is honest, and the writing quality is noticeably ahead of GPT-4o for prose-heavy workloads.
GitHub Copilot Goes Multi-Model — OpenAI Codex Returns as an Agent
Microsoft opened Copilot to Claude, Gemini, and o3 alongside GPT-4o. OpenAI’s new Codex runs tasks asynchronously — submit a task, come back to a pull request. Still early, but the right direction.
ByteDance’s Doubao: The Quiet Giant Most Europeans Are Ignoring
Doubao Pro and Lite consistently outperform their weight class on coding and reasoning. At $0.14/M tokens for Lite, cost-per-quality is unmatched — if your data policy allows a Chinese-hosted endpoint.
Gemini 2.5 Pro — the Dark Horse in the Context Window Wars
1M tokens natively, and it actually performs across the full window. Researchers running legal document analysis and codebase ingestion report results that match chunking-based RAG — without the complexity.
Free Tools Worth Having Right Now
Run Llama 3, Mistral, Qwen, Gemma, and dozens more entirely on your own machine. Zero API cost, zero data leaving your hardware. If you have a modern GPU (8GB+ VRAM) or Apple Silicon, this is the most important tool you’re not yet using.
200+ LLMs via a single API key. A rotating list of completely free models currently includes Llama 3.1 405B, Gemma 3 27B, and several Mistral variants. One account, one key, genuinely capable models at no cost.
Full access to Gemini 2.5 Pro with the 1M context window — free in the playground. Rate-limited, but sufficient for experimentation and building intuition before committing to the API. No credit card.
Daily access to Claude 3.5 Sonnet with generous limits. Quality-per-message among the highest of any free AI product. Worth having as a second model even if you’re already paying for something else.
The Honest LLM API Guide — May 2026
| Provider / Model | Best For | Input $/M | Output $/M | Verdict |
|---|---|---|---|---|
| Claude 3.5 SonnetAnthropic API | Agentic tasks, writing, long context | $3.00 | $15.00 | Top Pick |
| Claude 3 HaikuAnthropic API | High-volume, classification, speed | $0.25 | $1.25 | Best Value |
| GPT-4o + CodexOpenAI / GitHub Copilot | Coding, IDE integration, PRs | $5.00 | $15.00 | Dev First |
| Gemini 2.5 ProGoogle AI / Vertex | Long docs, multimodal, research | $1.25 | $10.00 | Context King |
| Doubao Pro / LiteByteDance (Volces) | Bulk workloads, cost-critical | $0.14 | $0.28 | Cheapest |
| Ollama (local)Your hardware | Privacy, offline, zero API cost | £0 | £0 | Free Forever |
| OpenRouteropenrouter.ai | Model switching, free tier access | Varies | Varies | Most Flexible |
Our preferred API for anything that requires actual thinking. Writing quality is noticeably ahead of GPT-4o for prose-heavy tasks. The 200K context window is generous. Documentation is the most honest in the industry. Watch for: output pricing at $15/M adds up fast — use Haiku ($1.25/M) for anything that doesn’t need Sonnet quality.
If you write code for a living, GitHub Copilot remains the most integrated option at £10/month. The new Codex cloud agent runs tasks asynchronously — submit a task, come back to a PR. Watch for: GPT-4o output at $15/M is steep. o3 and o4-mini are more capable for reasoning but cost more.
The most capable model for large document or codebase ingestion in one shot. 1M tokens and it actually performs across the full window — more than can be said for earlier long-context attempts. Google AI Studio free tier lets you test before paying. Watch for: Flash 2.0 is cheap ($0.075/M) but noticeably weaker.
Doubao Lite 32K at $0.14/M input is among the lowest-cost options that produces coherent output. API is OpenAI-compatible — migration is straightforward. Watch for: data residency is in China. Hard blocker for EU user data or sensitive commercial IP.
One command to download, one to run. Exposes an OpenAI-compatible REST API locally — your existing tooling connects with a single URL change. Llama 3.1, Mistral Nemo, Qwen 2.5, DeepSeek, Gemma 3, Phi-4 all available. Watch for: quality is hardware-dependent. 8B on a GPU is a different experience from 7B on CPU.
One API key, one billing account, access to models from Anthropic, OpenAI, Google, Meta, Mistral, and dozens more. Provider pricing plus a small markup. Free-tier models included. Watch for: a middleman means extra latency and a single point of failure. Go direct at production scale — OpenRouter is for exploration and model agility.
Also From AI Maestro — Related Reading
Claude Code vs GitHub Codex vs Cursor: The Honest 2026 Comparison
Three AI coding assistants, one honest verdict. We tested each on real tasks — not synthetic benchmarks.
Read the guide →
GPU Rental vs LLM API vs Cloud Hosting: Which Actually Makes Sense?
The infrastructure decision most developers get wrong. We lay out the real numbers and when each approach breaks down.
Read the guide →
Ollama Cloud Review 2026: Is It Actually Worth It?
We ran Ollama Cloud through its paces. Here is what the marketing doesn’t tell you — and when the local version is still the better call.
Read the review →
