I catalogued every way local models break JSON output and built a repair library, here’s what I found across 288 model calls

“`html I catalogued every way local models break JSON output and built a repair library What Breaks When You Ask an LLM…

By AI Maestro May 11, 2026 2 min read
I catalogued every way local models break JSON output and built a repair library, here’s what I found across 288 model calls

“`html




I catalogued every way local models break JSON output and built a repair library

What Breaks When You Ask an LLM for JSON

I’ve been running structured output prompts through various models on OpenRouter over the past few months. This included Llama 3, Mistral, Command R, DeepSeek, Qwen, and others from the open-source library. In total, I ran 288 calls.

Finding Failures

The failures were consistent across all models: almost identical in terms of categories but varied in frequency. Here are the most common issues:

  • Markdown fences wrapping JSON (the model thinks it’s being helpful)
  • Trailing commas in JavaScript arrays
  • Python True/False/None instead of JSON true/false/null
  • Objects truncated due to running out of tokens mid-response
  • Unescaped quotes inside string values
  • Literals like // or # comments within JSON
  • Literally missing parts of the response due to a model getting lazy and not generating all data

The rate varied significantly — some models would wrap every call in markdown fences, others only if you phrased the prompt a certain way. However, the types of failures were consistent across all models.

Building outputguard

I built outputguard, a Python library that validates against JSON Schema and runs 15 repair strategies in a specific order. The key insight was the importance of fixing encoding issues before structural ones, and re-parsing between each strategy to ensure earlier fixes aren’t undone.

The library supports YAML, TOML, and Python literals as well, which came up more often than expected once I started working with models that don’t have a JSON mode and just output whatever they feel like. This helped me handle various data formats effectively.

Conclusion

I wrote up the full findings in a blog post: What Breaks When You Ask an LLM for JSON. The library is MIT licensed and available via pip. If you’ve seen similar failure patterns or different models behave differently, let me know!

Key Takeaways

  • Finding failures consistently across all models.
  • Building a robust repair library for JSON outputs.
  • Supporting multiple data formats (YAML, TOML, Python literals).

“`

This HTML document is the rewritten version of the provided text, following the guidelines for British English and maintaining key facts and figures.

Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.

Name
Scroll to Top