These two founders left Goldman and Meta to build voice AI for markets everyone else overlooked

Disclosure: Some links in this article are affiliate links. AI Maestro may earn a commission if you make a purchase, at no…

By AI Maestro June 3, 2026 4 min read
These two founders left Goldman and Meta to build voice AI for markets everyone else overlooked

For makers and artists in the voice AI space, the lesson is clear: a model that sounds perfect in London or New York often fails catastrophically in Lagos or Cairo. The current rush to automate customer support has exposed a brutal reality—plug-and-play orchestration tools like Vapi and LiveKit cannot handle the specific demands of the Global South. Latency, jitter, and poor dialect recognition mean that automated calls in these regions are frequently unusable, forcing businesses to roll back systems that promised efficiency. If you are building for these markets, you cannot rely on off-the-shelf large language models hosted in the West; you must build bespoke, lightweight architectures that respect local speech patterns and infrastructure constraints.

The gap in the market

Global enterprises are racing to deploy AI to automate operations, but the results in emerging markets have been mixed. In Egypt, a major call centre automated a significant volume of traffic only to abandon the system due to poor performance. Across Africa, support centres report that finding engineers capable of building cost-effective automation is a persistent headache. The technical hurdles are severe: automated calls in the region suffer from outrageous latency and jitter. Using large models hosted outside the region exacerbates this issue, creating delays that destroy user trust. As CTO Ayooluwa Odemuyiwa noted, the only viable solution is to use very small models and cut latency at every single step.

Building from scratch

AethexAI, a startup founded last year by Mariama Diallo and Ayooluwa Odemuyiwa, was created to close this gap. Diallo, formerly of Goldman Sachs and later ModelML, and Odemuyiwa, a Caltech graduate who worked at Meta before attending Stanford Business School, decided that existing orchestration tools were insufficient. Instead of relying on standard infrastructure, they built their own small model and orchestration layer from the ground up.

The company raised $3 million in pre-seed funding led by 4DX Ventures, with participation from Enza Capital, Dorm Room Fund, Mojo Ventures, and Stanford GSB 26 Fund. Individual investors include Stanford faculty, telecom executives, and AI researchers from Anthropic.

To train these models, AethexAI avoided the millions typically spent on massive datasets. Instead, they used anonymised recordings from a call centre partner and shipped hard drives to radio stations across Africa to collect audio data. To keep costs down, they established a contributor network of university students to annotate data and pronounce local names. This approach allowed them to develop the Kora series, a set of models with parameters ranging from 300 million to 1.7 billion. This is a fraction of the size of typical LLMs, but it is precisely the point: smaller models can run locally, reducing latency while maintaining accuracy for local dialects of English, French, and Arabic.

Business strategy for local realities

The startup is currently handling more than 17,000 calls per day. However, they are careful not to promise everything to everyone. CEO Diallo advises clients to pick a single, critical use case to start with, rather than attempting a full-scale overhaul immediately. The company is open to all industries but is currently focused on debt collection, customer activation, and KYC (Know Your Customer) verification.

Success requires more than just software; it requires on-the-ground engineering. AethexAI is hiring forward-deployed engineers on a contract basis to serve local markets and is building channel partnerships with telecoms providers to handle telephony infrastructure. As they see it, generic plug-and-play solutions simply will not work here.

Why the giants haven’t moved in

Walter Baddoo, co-founder and managing partner of 4DX Ventures, argues that the Africa and Middle East market is fundamentally different from the Western markets that most voice AI companies were built to serve. Enterprises in these regions process roughly three times the call volume of their Western counterparts because voice remains the dominant channel for customer interaction. Incumbent systems were designed for high-end GPU infrastructure, standard English and European speech environments, and enterprise workflows common in the U.S. and Europe.

This creates real gaps when enterprises need systems that handle dialects, code-switching, and informal speech patterns, and that work within their existing telephony infrastructure and actual price points. While companies like ElevenLabs, Deepgram, Sierra, and Cognigy are expanding globally, the markets they were built for and the markets they are entering aren’t always the same thing. Startups like AethexAI are betting that these gaps—requiring models specialised in local dialects, on-the-ground partnerships, and infrastructure built for the region—represent a market opening that the giants have neither the incentive nor the architecture to close.

Key takeaways

  • “The latency and jitter that we saw on automated calls in this region were outrageous.” — Ayooluwa Odemuyiwa, CTO of AethexAI.

  • Small, locally hosted models (300m to 1.7b parameters) outperform massive Western LLMs in emerging markets by eliminating cross-border latency.
  • Successful deployment in Africa and the Middle East requires forward-deployed engineers and partnerships with local telecoms providers, not just software licences.
  • Client adoption follows a “one use case at a time” philosophy, focusing on debt collection, activation, and KYC verification.

Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.

Name
Scroll to Top