“`html
GLiGuard: A 300M Parameter Safety Moderation Model That Matches or Exceeds Accuracy of Models 23–90x Its Size
To understand what makes GLiGuard different, it helps to first consider why existing guardrail models like LlamaGuard4 (12B), WildGuard (7B), ShieldGemma (27B), and NemoGuard (8B) are slow. These models are built on decoder-only transformer architectures that generate safety verdicts one token at a time, similar to how large language models generate responses.
This design was initially suitable for fluid safety requirements where the model could adapt to new policies without retraining. However, this approach is not ideal for what fundamentally is a classification problem: evaluating multiple safety dimensions in real-time and across different tasks like identifying harmful content or detecting refusals. Decoder models are inherently sequential, leading to slow and computationally expensive evaluations.
What GLiGuard Actually Does
GLiGuard is a small encoder-based model that reframes safety moderation as a text classification problem rather than a text generation problem. Instead of generating tokens one by one, it encodes both the input text and task definitions (labels) together in a single forward pass. This allows GLiGuard to evaluate multiple tasks simultaneously without adding latency.
GLiGuard runs four moderation tasks concurrently in this manner: safety classification, jailbreak strategy detection across 11 strategies, harm category detection across 14 categories, and refusal detection. These tasks are handled by GLiGuard’s architecture which scores all labels at once for each set of inputs, making it both efficient and accurate.
Training Data and Fine-Tuning
GLiGuard was trained on a mixture of human-annotated and synthetically generated training data. For safety classification, response safety, and refusal detection, the team used WildGuardTrain, which contains 87,000 human-annotated examples. For harm category and jailbreak strategy detection, labels for unsafe samples were generated using GPT-4. To address any gaps in GLiGuard’s performance, the team used supplemental synthetic data with edge cases targeting fine-grained distinctions between similar categories like toxic speech and violence.
The model was trained via full fine-tuning of the GLiNER2-base-v1 checkpoint for 20 epochs using the AdamW optimizer. This approach leverages Fastino’s own architecture, GLiNER2, which is well-suited for multi-task text classification.
Benchmark Results: Accuracy and Speed
The research team evaluated GLiGuard across nine established safety benchmarks to assess its performance in terms of accuracy and speed. The results demonstrate that GLiGuard outperforms existing guardrail models like LlamaGuard4-12B, ShieldGemma-27B, and NemoGuard-8B despite being significantly smaller.
- GLiGuard scores an average F1 of 87.7 on prompt classification, within 1.7 points of the best model (PolyGuard-Qwen at 89.4).
- It achieves the second-highest average F1 on response classification (82.7), behind only Qwen3Guard-8B (84.1).
- GLiGuard outperforms these larger models despite being 23–90× smaller.
The benchmark results also highlight GLiGuard’s superior speed and throughput compared to existing guardrails:
- GLiGuard achieves up to 16.2× higher throughput (133 vs. 8.2 samples/s at batch size 4).
- It achieves up to 16.6× lower latency: 26 ms vs. 426 ms at sequence length 64.
These improvements are significant, especially in real-time applications where even small differences can make a substantial difference.
Marktechpost’s Visual Explainer
GLiGuard — Fastino Labs
Read more about GLiGuard and its benchmark results here.
Key Takeaways
- GLiGuard is a small encoder-based model that can evaluate multiple safety dimensions in one forward pass, making it significantly faster than existing decoder-only guardrails.
- On benchmarks covering both prompt and response classification tasks, GLiGuard outperforms larger models like LlamaGuard4-12B by up to 90 times its size.
- The model runs up to 16.6× faster than state-of-the-art decoder guardrails while maintaining high accuracy across multiple safety criteria.
“`
This HTML content mirrors the structure and key points of the original article but is written in British English, with a focus on clarity and conciseness.
Originally published at marktechpost.com. Curated by AI Maestro.
Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.

