Probably raises $9M to build a more reliable kind of AI

For makers and artists: why hallucination-free AI matters for your workflow

In the rush to integrate generative models into creative pipelines, reliability is often the first casualty. When an audio engine or a design assistant hallucinates, it breaks the flow and erodes trust. Probably, a new startup backed by Andreessen Horowitz, is tackling this head-on by building a system designed to eliminate factual errors before they ever reach the user.

A seed round for rigorous accuracy

The company recently secured $9 million in seed funding to pursue this vision. Founder Peter Elias argues that the industry has been too focused on raw capability rather than correctness. The goal is to achieve the kind of 99.99% accuracy typical of deterministic systems, a standard that has historically been elusive in probabilistic AI.

How the “mech suit” works

Probably’s initial offering is a data science tool designed to extract answers from complex datasets. Unlike standard chatbots, every output is accompanied by a citation and a full audit trail detailing how the result was derived. This transparency is becoming essential for professional use cases, from financial modelling to technical analysis.

To ensure this level of integrity, Elias describes an architecture he calls a “data science mech suit.” The system does not rely on the LLM to work alone. Instead, the model’s first-pass response is immediately cross-referenced against a deterministic validator. If the answer does not align with the source data, it is rejected. Crucially, the LLM has been trained specifically to recognise this validator, creating a feedback loop that optimises the system for speed and precision.

“What we learned building this was that the better your harness engineering is, the weaker the model can be.” — Peter Elias

By refining the context so thoroughly that the model faces minimal ambiguity, the system can operate effectively even with less powerful AI. Elias notes that the current version runs on a model “four classes weaker than the frontier models.” This allows the tool to execute on local hardware, such as a desktop computer, rather than requiring expensive data centre resources.

Cost efficiency and broader applications

Running locally drastically reduces token costs, a significant advantage as prices for API calls continue to climb. This approach also opens the door to other precision-sensitive sectors beyond data science. The same engine could be adapted for accounting, medical diagnostics, or any field where a single error has real-world consequences.

Elias suggests that major AI labs have largely ignored this approach because their business models rely on volume and correction. “They’re incentivized not to, because they make money the more times you have to correct the model,” he observes. For creators and businesses looking for tools that simply work without constant verification, Probably offers a compelling alternative.

Key takeaways

Probably has raised $9 million to build an AI system that prevents hallucinations by using a deterministic validator to check every output.
The “mech suit” architecture allows the tool to run on significantly smaller models, enabling local execution on desktop hardware and cutting token costs.
The system is designed for precision-sensitive use cases like accounting and medicine, addressing a gap left by large AI labs focused on raw capability rather than accuracy.

Source Read original →

Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.