I catalogued every way local models break JSON output and built a repair library, here's what I found across 288 model calls

“`html

I catalogued every way local models break JSON output and built a repair library

What Breaks When You Ask an LLM for JSON

I’ve been running structured output prompts through various models on OpenRouter over the past few months. This included Llama 3, Mistral, Command R, DeepSeek, Qwen, and others from the open-source library. In total, I ran 288 calls.

Finding Failures

The failures were consistent across all models: almost identical in terms of categories but varied in frequency. Here are the most common issues:

Markdown fences wrapping JSON (the model thinks it’s being helpful)
Trailing commas in JavaScript arrays
Python True/False/None instead of JSON true/false/null
Objects truncated due to running out of tokens mid-response
Unescaped quotes inside string values
Literals like // or # comments within JSON
Literally missing parts of the response due to a model getting lazy and not generating all data

The rate varied significantly, some models would wrap every call in markdown fences, others only if you phrased the prompt a certain way. However, the types of failures were consistent across all models.

Building outputguard

I built outputguard, a Python library that validates against JSON Schema and runs 15 repair strategies in a specific order. The key insight was the importance of fixing encoding issues before structural ones, and re-parsing between each strategy to ensure earlier fixes aren’t undone.

The library supports YAML, TOML, and Python literals as well, which came up more often than expected once I started working with models that don’t have a JSON mode and just output whatever they feel like. This helped me handle various data formats effectively.

Conclusion

I wrote up the full findings in a blog post: What Breaks When You Ask an LLM for JSON. The library is MIT licensed and available via pip. If you’ve seen similar failure patterns or different models behave differently, let me know!

Key Takeaways

Finding failures consistently across all models.
Building a robust repair library for JSON outputs.
Supporting multiple data formats (YAML, TOML, Python literals).

“`

This HTML document is the rewritten version of the provided text, following the guidelines for British English and maintaining key facts and figures.

Source Read original →

I catalogued every way local models break JSON output and built a repair library, here’s what I found across 288 model calls

What Breaks When You Ask an LLM for JSON

Finding Failures

Building outputguard

Conclusion

Key Takeaways

Empowering Businesses with AI: Smart Tools, Smarter Business Decisions.

follow us

Popular Tag

Popular Post

Ten advances in mathematics…

Judge denies xAI’s request…

YouTuber Hank Green says…

What Breaks When You Ask an LLM for JSON

Finding Failures

Building outputguard

Conclusion

Key Takeaways

Related articles

Empowering Businesses with AI: Smart Tools, Smarter Business Decisions.

follow us

Popular Tag

Popular Post

Ten advances in mathematics…

Judge denies xAI’s request…

YouTuber Hank Green says…