Phonely AI Agents Achieve 99% Accuracy
Source: venturebeat.com
A collaboration between Phonely, Maitai, and Groq has achieved a breakthrough to address a persistent problem: awkward delays that immediately signal to callers they’re talking to a machine.
The collaboration has enabled Phonely to reduce response times by more than 70% while simultaneously boosting accuracy from 81.5% to 99.2% across four model iterations, surpassing a 94.7% benchmark by 4.5 percentage points.
The improvements stem from Groq’s new capability to instantly switch between multiple specialized AI models without added latency, orchestrated through Maitai’s optimization platform.
The achievement addresses the subtle cues that make automated conversations feel distinctly non-human. For call centers and customer service operations, one of Phonely’s customers is replacing 350 human agents this month alone.
Traditional large language models have struggled with responding quickly enough to maintain natural conversation flow. A few seconds of delay feels interminable during live phone conversations.
Will Bodewes, Phonely’s founder and CEO, said major LLM providers have a very high degree of latency variance, and this delay is what makes most voice AI today feel non-human.
The problem occurs roughly once every ten requests, meaning standard conversations inevitably include at least one or two awkward pauses that immediately reveal the artificial nature of the interaction. These delays have created a significant barrier to adoption.
The solution emerged from Groq’s development of “zero-latency LoRA hotswapping” — the ability to instantly switch between multiple specialized AI model variants without any performance penalty. LoRA allows developers to create lightweight, task-specific modifications to existing models rather than training entirely new ones from scratch.
Chelsey Kantor, Groq’s chief marketing officer, explained that Groq’s architecture and high-speed on-chip memory means that it is possible to access multiple hot-swapped LoRAs with no latency penalty, as the LoRAs are stored and managed in SRAM alongside the original model weights.
Maitai's Role
This infrastructure advancement enabled Maitai to create a “proxy-layer orchestration” system that continuously optimizes model performance.
Christian DalSanto, Maitai's founder, said that Maitai acts as a thin proxy layer between customers and their model providers, dynamically selecting and optimizing the best model for every request and automatically applying evaluation, optimizations, and resiliency strategies such as fallbacks.
The system works by collecting performance data from every interaction, identifying weak points, and iteratively improving the models without customer intervention. Maitai collects signals identifying where models underperform, and these ‘soft spots’ are clustered, labeled, and incrementally fine-tuned to address specific weaknesses without causing regressions.
Time to first token dropped 73.4% from 661 milliseconds to 176 milliseconds at the 90th percentile. Overall completion times fell 74.6% from 1,446 milliseconds to 339 milliseconds. Accuracy improvements followed an upward trajectory across four model iterations, starting at 81.5% and reaching 99.2%.
Bodewes stated that 70%+ of people who call into their AI are not able to distinguish the difference between a person, and with a custom fine tuned model that talks like a person, and super low-latency hardware, there isn’t much stopping them from crossing the uncanny valley of sounding completely human.
One of Phonely’s biggest customers saw a 32% increase in qualified leads as compared to a previous version using previous state-of-the-art models. One of the call centers they work with is actually replacing 350 human agents completely with Phonely just this month.
Bodewes explained that Phonely excels in appointment scheduling and lead qualification specifically. The company has partnered with major firms handling insurance, legal, and automotive customer interactions.
Groq’s specialized AI inference chips, called Language Processing Units (LPUs), provide the hardware foundation that makes the multi-model approach viable. Kantor said the LPU architecture is optimized for precisely controlling data movement and computation at a fine-grained level with high speed and predictability.
Kantor explained that the cloud-based infrastructure also addresses scalability concerns and that Groq handles orchestration and dynamic scaling for their customers for any AI model they offer, including fine-tuned LoRA models.
DalSanto said that for companies already in production using general-purpose models, they typically transition them to Maitai on the same day, with zero disruption, and can deliver a fine-tuned model that’s faster and more reliable than their original setup within days to a week. The proxy-layer approach means companies can maintain their existing API integrations while gaining access to continuously improving performance.
DalSanto said they’re observing growing demand from teams breaking their applications into smaller, highly specialized workloads. He also stated that multi-LoRA hotswapping lets companies deploy faster, more accurate models customized precisely for their applications, removing traditional cost and complexity barriers, fundamentally shifting how enterprise AI gets built and deployed.
DalSanto said multi-LoRA hotswapping enables low-latency, high-accuracy inference tailored to specific tasks, and their roadmap prioritizes further investments in infrastructure, tools, and optimization to establish fine-grained, application-specific inference as the new standard.