DeepReinforce has released Ornith-1.0, a family of open-source coding models that learn their own reinforcement learning scaffolds. The collection includes four variants ranging from a 9B dense model to a 397B mixture-of-experts flagship, all distributed under the MIT licence on Hugging Face. Every checkpoint is post-trained atop Gemma 4 and Qwen 3.5.
In this article
Most coding agents operate with a fixed, human-designed harness. Ornith-1.0 instead learns to write its own. The DeepReinforce research team reports state-of-the-art results among open models of comparable size.
TL;DR
- Ornith-1.0 ships in 9B, 31B, 35B-MoE, and 397B-MoE sizes under MIT, built on Gemma 4 and Qwen 3.5.
- The model learns its own scaffold during RL, jointly optimizing the harness and the solution.
- Ornith-1.0-397B tops Claude Opus 4.7 on both headline benchmarks, but not Opus 4.8 or the larger GLM-5.2-744B.
- Three layers — fixed trust boundary, deterministic monitor, frozen LLM judge — guard against reward hacking.
What is Ornith-1.0?
Ornith-1.0 is a set of reasoning models tuned for coding agents. The variants are 9B Dense, 31B Dense, 35B MoE, and 397B MoE. The 35B model is mixture-of-experts and activates roughly 3B parameters per token. FP8 and GGUF builds are also published for faster local serving.
Each model is a reasoning model. Replies open with a <think> block before the final answer. The serving recipes enable a reasoning parser, so that trace returns in a separate reasoning_content field. The models also emit well-formed tool calls for agent loops.
Deployment is straightforward. The 9B model is about 19GB in bf16 and serves on a single 80GB GPU. Serving recipes target vLLM, SGLang, and Transformers. Each model exposes an OpenAI-compatible endpoint. Standard agent frameworks therefore work without code changes.




