Sakana AI Launches Sakana Fugu: An Orchestration Model That Routes Tasks Across a Swappable Pool of Frontier LLMs

Disclosure: Some links in this article are affiliate links. AI Maestro may earn a commission if you make a purchase, at no…

By AI Maestro June 22, 2026 3 min read
Sakana AI Launches Sakana Fugu: An Orchestration Model That Routes Tasks Across a Swappable Pool of Frontier LLMs

Sakana AI has released Sakana Fugu, a system that presents a team of large language models as a single API endpoint. Users send one request to a single address, and the software decides internally whether to handle the task alone or coordinate a group of specialists. The complexity of managing multiple agents does not appear in the user’s code.

The core capabilities

Fugu operates as a language model itself, trained to instruct other models within a pool. This pool includes recursive instances of Fugu. The system manages selection, delegation, verification, and synthesis of results without requiring hard-coded workflows.

Sakana AI describes this as a strategy to reduce reliance on a single vendor. If one provider restricts access or changes terms, Fugu can route tasks to other available models. The team cites recent export controls on Anthropic‘s Fable and Mythos models as a practical driver for this approach. Newer models can be added to the pool over time.

Two models, one interface

The release includes two variants behind one OpenAI-compatible API:

  • Fugu prioritises speed alongside performance. It serves as the default for general coding, code review, and chatbots. Teams can opt specific agents out of the pool to meet data privacy or compliance rules.
  • Fugu Ultra targets maximum accuracy on difficult, multi-step problems. It coordinates a deeper set of expert agents. Opt-out is not available for this variant. The current model ID is fugu-ultra-20260615.

Technical background

The system builds on two papers presented at ICLR 2026: Trinity and Conductor. Trinity uses a lightweight coordinator to assign roles like Thinker, Worker, or Verifier across several turns. Conductor is trained with reinforcement learning to discover natural-language coordination strategies for diverse LLM pools.

Together, these methods allow systems to learn how to assemble and route agents for specific tasks, replacing the need for hand-designed workflows.

Performance results

Sakana AI compares Fugu against the foundation models it orchestrates. Baselines use scores reported by providers. SWE Bench Pro uses the mini-swe-agent as scaffolding.

BenchmarkFuguFugu UltraOpus 4.8Gemini 3.1 ProGPT 5.5
SWE Bench Pro*59.073.769.254.258.6
TerminalBench 2.180.282.174.670.378.2
LiveCodeBench92.993.287.888.585.3
LiveCodeBench Pro87.890.884.882.988.4
Humanity’s Last Exam47.250.049.844.441.4
CharXiv Reasoning85.186.684.283.384.1
GPQA-D95.595.592.094.393.6
SciCode60.158.753.558.956.1
τ³ Banking21.720.620.68.420.6
Long Context Reasoning74.773.367.772.774.3
MRCRv286.693.687.984.994.8

The orchestrator posts the top score on 10 of 11 rows. Fugu Ultra tops the four coding benchmarks, CharXiv Reasoning, and Humanity’s Last Exam. It ties regular Fugu on GPQA-D. Regular Fugu leads SciCode, τ³ Banking, and Long Context Reasoning. GPT 5.5 wins MRCRv2, the only baseline win here.

The Fugu models stand shoulder-to-shoulder with Anthropic’s Fable 5 and Mythos Preview. Those two are not in Fugu’s pool, since they are not publicly accessible.

Real-world examples

Sakana AI ran a beta with close to 500 early users. The published examples favour long, multi-step tasks.

  • AutoResearch: An agent improved a small GPT’s training recipe autonomously. It ran 123 experiments over roughly 14 hours on one H100 GPU. Fugu Ultra reached the best mean validation BPB of 0.9774, with a best single run of 0.9748.
  • Rubik’s Cube solver: Each model wrote a pure-Python solver, no libraries allowed. Fugu Ultra solved all 300 held-out cubes, averaging 19.72 moves. One baseline matched it closely at 19.76 moves. Two others crashed and solved none.
  • Classical Japanese kana reading order: On a 1610 letter, Fugu Ultra scored NED 0.80. The nearest baseline reached only 0.24.
  • Blindfold chess: Fugu played four games from memory, with no board shown. It beat three frontier models and a 2100-Elo Stockfish engine.
  • Online trading: On one 50-week window, Fugu Ultra returned +19.43% on average across five runs. The other frontier models stayed below +15%. Sakana AI notes past performance does not guarantee future results.

Getting started

Fugu uses an OpenAI-compatible API, so no SDK migration is required. Point an existing client at your console-provided endpoint.

from openai import OpenAI

# Endpoint and key come from your Sakana console (console.sakana.ai).
client = OpenAI(
    base_url="https://<your-fugu-endpoint>/v1",  # from console.sakana.ai
    api_key="YOUR_SAKANA_API_KEY",
)

resp = client.chat.completions.create(
    model="fugu-ultra-20260615",           # or "fugu"
    messages=[
        {"role": "user",
         "content": "Reproduce the method in this paper and report the gap."},
    ],
)

print(resp.choices[0].message.content)

Token usage and cost are reported per request. So you can monitor spend in real time.

Early reactions

A manual review of public reaction on X and Hacker News, with links to every source. Captured June 22, 2026.

A manual review of public reaction on X and Hacker News, with links to every source. Captured June 22, 2026.

12 posts reviewed

Sentiment split (n = 12)

  • Supportive 3
  • Skeptical 6
  • Critical 3

Early reaction skews skeptical. The “is this just a router or wrapper?” question dominates. The clearest supportive voices are Sakana-affiliated.

Press & analysis

Scroll to Top