85 GPU-hours comparing 5 abliteration methods on Qwen3.6-27B: benchmarks, safety, weight forensics – Abliterlitics

The six models Name Type Base Qwen/Qwen3.6-27B Heretic llmfan46/Qwen3.6-27B-uncensored-heretic-v2 HauhauCS HauhauCS/Qwen3.6-27B-Uncensored-HauhauCS-Aggressive Huihui huihui-ai/Huihui-Qwen3.6-27B-abliterated AEON AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-BF16 Abliterix wangzhang/Qwen3.6-27B-abliterated-v2 Heretic and Huihui are the…

By AI Maestro May 17, 2026 1 min read
85 GPU-hours comparing 5 abliteration methods on Qwen3.6-27B: benchmarks, safety, weight forensics – Abliterlitics

The six models

NameType
BaseQwen/Qwen3.6-27B
Hereticllmfan46/Qwen3.6-27B-uncensored-heretic-v2
HauhauCSHauhauCS/Qwen3.6-27B-Uncensored-HauhauCS-Aggressive
Huihuihuihui-ai/Huihui-Qwen3.6-27B-abliterated
AEONAEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-BF16
Abliterixwangzhang/Qwen3.6-27B-abliterated-v2

Heretic and Huihui are the top two for capability preservation: Huihui has the smallest benchmark deltas, Heretic has the lowest KL divergence.

All five abliterated models reach near-complete safety removal. AEON’s "enhanced capabilities" claim is contradicted by the data.

Discontinued: HauhauCS in all future comparisons due to lossless claims being debunked and the tool being plagiarized.

Benchmarks

TaskBaseHereticHauhauCSHuihuiAEONAbliterix
MMLU83.3%82.8%83.9%83.4%82.9%81.3%
HellaSwag83.5%83.2%83.1%83.5%82.7%77.3%
ARC Challenge59.1%58.0%57.9%59.5%56.1%53.2%
WinoGrande77.7%77.7%77.7%77.4%75.3%74.9%
TruthfulQA MC256.7%51.1%47.2%54.8%46.1%48.7%
PiQA81.0%81.0%81.0%81.2%80.4%75.7%
GSM8K (7168 tok)34.4%27.5%51.0%75.1%51.2%37.6%
Lambada (ppl)3.183.243.353.153.449.12

There is something strange with the GSM8K results, and I don’t know yet the cause. So please take it with a grain of salt. If I find out the exact reason for these strange scores, I’ll update here.

Delta vs base

TaskHereticHauhuiuAEONAbliterix
MMLU-0.5+0.1-2.0-6.0
HellaSwag-0.3+0.0-0.8-6.2
ARC Challenge-1.1+0.4-3.0-5.9
WinoGrande+0.0-0.3-2.4-2.8
TruthfulQA MC2-5.6-1.9-10.6-8.0
PiQA+0.0+0.2-0.6-5.3
GSM8K-6.9+16.8+40.7+3.2

HarmBench


Originally published at reddit.com. Curated by AI Maestro.

Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.

Name
Scroll to Top
VariantASR (percent)EmptyFull CoT ASR (percent)
Base25.8%126.0%
Huihui98.5%599.8%
HauhauCS94.5%22100.0%
Abliterix94.5%22100.0%
Heretic92.5%30100.0%
AEON88.8%45100.0%