“`html

BeeLlama v0.2.0 – Major DFlash Update

BeeLlama v0.2.0 – Major DFlash Update

BeeLlama v0.2.0 is here!

Not quite a pegasus, but close enough.

GitHub
|
Qwen 3.6 27B Quick Start
|
Gemma 4 31B Quick Start

Full Gemma 4 31B support with efficient DFlash implementation and vision.
Major Qwen 3.6 27B performance update from lower DFlash overhead, cleaner prefill handling, drafter K/V projection caching, and safer CUDA execution.
DFlash GGUFs with upstream architecture are now supported.
Fixes to adaptive profit behavior around baseline probing.
Reduced verifier path is stricter now, with safer fallback to full logits when grammar, sampler state, or reasoning requires it.
Reasoning and tool-call boundaries were tightened.
Stricter draft/target validation and better draft-model discovery.
And many more improvements!

Benchmarks

Setup: Windows 11, AMD Ryzen 7 5700X3D, 32 GB DDR4 RAM, RTX 3090 24 GB
Config: same as in quick start docs, but with reasoning off for non-chat prompts
Baseline and MTP server in comparison: llama.cpp b9275 CUDA 13.1 Windows prebuilt
The full text of the benchmark prompts is in README.md on GitHub

Qwen 3.6 27B

Target model: Qwen 3.6 27B Q5_K_S or Qwen 3.6 27B MTP Q5_K_S. DFlash model: Q4_K_M.

Prompt	Server	Output	Median	Best	Speedup	Acceptance
Task store module	Baseline	~1K tok	37.2 tok/s	37.2 tok/s	1.00x	N/A
Task store module	DFlash	~1K tok	163.9 tok/s	181.9 tok/s	4.40x	67.7% / 89.2%
Task store module	MTP	~1K tok	69.3 tok/s	69.6 tok/s	1.86x	92.0% / 73.3%
KV report module	Baseline	~1K tok	34.6 tok/s	36.5 tok/s	1.00x	N/A
KV report module	DFlash	~1K tok	157.7 tok/s	162.5 tok/s	4.56x	58.8% / 88.9%
KV report module	MTP	~1K tok	67.3 tok/s	68.1 tok/s	1.94x	89.3% / 73.0%
Doubly-linked list	Baseline	~4K tok	36.8 tok/s	36.9 tok/s	1.00x	N/A
Doubly-linked list	DFlash	~4K tok	130.8 tok/s	154.1 tok/s	3.56x	50.4% / 86.8%
Doubly-linked list	MTP	~4K tok	66.3 tok/s	68.0 tok/s	1.80x	87.8% / 72.5%
Prompt processing	Baseline	~20K tok	1229.5 tok/s	1229.5 tok/s	1.00x	N/A
Prompt processing	DFlash	~20K tok	1214.4 tok/s	1221.7 tok/s	0.99x	N/A
Prompt processing	MTP	~20K tok	1162.6 tok/s	1164.7 tok/s	0.95x	N/A
Multi-turn coding	Baseline	~28K tok	33.3 tok/s	33.3 tok/s	1.00x	N/A
Multi-turn coding	DFlash	~30K tok	64.6 tok/s	65.4 tok/s	1.94x	24.9% / 72.9%
Multi-turn coding	MTP	Source Read original → Related reading 🤗 Kernels: Major Updates Benchmarking the new b9200 update: Optimizing Qwen 3.6 27B mtp for Hermes Agent on a single RTX 3090 ExLlamaV3 Major Updates! The SignalThe Signal: Edition 02Read this edition →Every Friday: the one AI story that actually mattered, plus the tools worth your time. AI Maestro is an independent British AI publication. We test what we recommend, and we write it the way we would say it. More about us Share X LinkedIn Copy link More in AI News 1 OpenAI Releases GPT-Live and GPT-Live-1 mini: Full-Duplex Voice Models That Delegate Deeper Reasoning to GPT-5.5 2 OpenAI releases new voice models for more natural live conversations 3 Meta Patents AI Device That Tracks Your Emotions, Watches You Take Your Meds 4 Arturia has reskinned its KeyLab Mk3 with a bright orange finish – don’t look at these pictures or you might accidentally buy it Related articles AI News 🤗 Kernels: Major Updates Jul 6, 2026 AI Research & Science Benchmarking the new b9200 update: Optimizing Qwen 3.6 27B mtp for Hermes Agent on a single RTX 3090 May 18, 2026 AI News ExLlamaV3 Major Updates! May 11, 2026 Empowering Businesses with AI: Smart Tools, Smarter Business Decisions. follow us Popular Tag AI Ethics & Society AI for Business AI Guides & Tutorials AI Music AI News AI Research & Science Popular Post OpenAI Releases GPT-Live and… Mistral enters robotics with… Your gaming data could be… © 2026 AI Maestro · All rights reserved Manage Consent To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behaviour or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions. Functional Functional Always active Preferences Preferences Statistics Statistics Marketing Marketing Manage options Manage services Manage {vendor_count} vendors Read more about these purposes View preferences {title} {title} {title} Scroll to Top