Modulate, a conversational voice intelligence company, has launched Velma Deepfake Detect, a synthetic voice detection API for continuous, full-call monitoring at scale, enabling detection of AI-generated audio across entire conversations in both batch and real-time streaming environments.
"Voice is one of the most vulnerable attack surfaces for modern enterprises," said Mike Pappas, CEO and co-founder of Modulate, in a statement. "The problem isn't just that synthetic audio is getting better; it's that it's incredibly cheap to create, while detection has historically been too expensive to deploy at scale. That's left real gaps in how companies defend themselves. Velma Deepfake Detect changes that by creating true cost parity with scammers creating fraudulent voice deepfakes. It's a paradigm shift that gives enterprises and developers a fraud prevention solution at a low cost required to catch the huge proliferation in deepfake fraud."
Built using Modulate's Ensemble Listening Model (ELM) architecture, Velma Deepfake Detect combines insights from short vocal tones and more complex rhythm or pronunciation patterns for end-to-end, real-time detection of deepfake fraud across helpdesks, call centers, content-sharing platforms, or other audio-rich environments.
Now available as an API for developers, Velma Deepfake Detect enables the following:
- Batch and real-time streaming detection endpoints;
- Probability-based scoring for flexible decision thresholds;
- Segment-level analysis for identifying partial manipulation;
- Accurate results with as little as 2-3 seconds of audio; and
- Robust performance across noisy, multi-speaker, and compressed audio.
The Velma Deepfake Detect API enables companies to incorporate detection into fraud prevention, contact centers, voice agents, and identity verification workflows. Because alerts and scores can be routed into existing systems, organizations can use Velma Deepfake Detect to support real-time decisions such as escalation, rerouting, secondary verification, or post-call review. As part of the broader Velma platform, detection can be combined with additional capabilities, including transcription, emotion detection, personally identifiable information redaction, and conversational analytics.