VoiceX

Enterprise Voice,Amplified by AI

Real-time speech-to-text, intelligent reasoning, and natural voice synthesis — woven into a single pipeline for voice assistants, transcription, and phone integration.

3 AI Layers|Real-time Streaming|Speaker Recognition
Create a Voice Assistant

The Opportunity

The Rise of Voice-First Enterprise AI

The fastest human interface

Voice Is How Humans Think

Speech is 3x faster than typing and requires zero training. Enterprises are shifting from type-to-query to speak-to-act — unlocking AI for frontline workers, field technicians, and executives who never open a laptop.

Untapped enterprise intelligence

Audio Data Is Invisible Gold

Meetings, customer calls, and field recordings contain critical decisions, commitments, and expertise — trapped in audio files. Organizations that transcribe, diarize, and index voice data gain a competitive edge.

Latency is the enemy

Real-Time Decisions Need Real-Time AI

A support agent can't wait 30 seconds for an answer. A field technician can't type while holding equipment. Voice AI that streams responses in real time transforms workflows that batch processing can't touch.

Text-based interfaces limit AI adoption to desk workers. Voice removes that barrier — making AI accessible to every role, in every environment, at the speed of conversation.

The Architecture

Three Layers, One Real-Time Pipeline

Audio flows through three AI layers — each specialized, all streaming, fully configurable.

Audio Input Stream
EARS

Speech-to-Text

Real-time speech recognition with sub-300ms latency. Supports speaker diarization, custom vocabulary, and 30+ languages.

Providers
Deepgram Nova-3AssemblyAIAzure Speech
Capabilities
Speaker diarizationCustom vocabularyLive streamingMulti-language
TRANSCRIPT
BRAIN

LLM + Tools

The reasoning layer. Processes transcribed speech, calls tools, retrieves knowledge, and formulates responses — all within the conversation flow.

Providers
OpenAI GPT-4oLocal via OllamaAny OpenAI-compatible
Capabilities
Tool orchestrationKnowledge retrievalCustom toolsSystem prompts
RESPONSE TEXT
VOICE

Text-to-Speech

Premium voice synthesis with natural prosody. Low-latency streaming output that begins speaking before the full response is generated.

Providers
CartesiaGender/language filtersVoice selection
Capabilities
Streaming synthesisNatural prosodyVoice cloning readyLow latency
Voice Output Stream

Capabilities

One Platform, Three Modes

Create conversational AI voice agents with configurable speech recognition, language models, and voice synthesis. Real-time bidirectional streaming via WebSocket.

Configurable STT, LLM, and TTS providers
Custom system prompts and personalities
Tool integration (knowledge retrieval, custom tools)
Real-time WebSocket audio streaming
Conversation history and recordings
Guardrails and safety controls

In Practice

VoiceX in Action

Real scenarios. Real conversations. Real outcomes.

Field Service

Hands-Free Equipment Diagnostics

A maintenance technician is troubleshooting a hydraulic press on the factory floor. Both hands are occupied.

What's the reset procedure for a Komatsu HBP-110 hydraulic press showing error code E-47?
E-47 on the HBP-110 indicates a hydraulic fluid pressure sensor fault. Here's the reset sequence:
KnowledgeX retrieval
1. Power down the main controller. 2. Check the pressure sensor cable at connector J4. 3. Wait 30 seconds, then restart in diagnostic mode by holding the blue button during power-on.
The connector looks corroded. Should I replace it?
Yes — connector corrosion is the most common cause of E-47. The part number is KPC-J4-R2. I've logged this as a maintenance event for your shift report.
Custom tool
Issue resolved in 4 minutes without leaving the equipment. Maintenance logged automatically.
Meeting Intelligence

Automated Meeting Analysis

A 45-minute leadership sync is recorded. The team needs a summary, action items, and speaker attribution.

[Audio uploaded: leadership-sync-feb-24.wav]
Transcription complete. 3 speakers identified:
Sarah Chen (VP Operations) — 18 min Marc Dubois (CFO) — 15 min Priya Rao (CTO) — 12 min
Key decisions: • Q3 headcount freeze approved • Cloud migration timeline moved to September • Budget reallocation of $2.4M from CapEx to OpEx
Action items: 1. Sarah: Draft hiring freeze communication by Friday 2. Marc: Prepare revised budget model by March 3 3. Priya: Scope September migration plan
Full transcript with speaker labels, 3 decisions captured, 3 action items assigned.
Customer Support

Intelligent Call Center Agent

An inbound call arrives at a financial services company. The VoiceX assistant handles initial triage before escalating if needed.

Thank you for calling Meridian Financial. I'm your AI assistant. How can I help you today?
I need to check the status of my wire transfer from last week.
I can help with that. Could you please verify your account number and the date of the transfer?
Identity verification
Account 4829-7731, sent last Thursday.
I've located your transfer of $12,500 to Chase Bank, initiated February 20th. Status: completed — funds were received on February 21st at 2:14 PM EST.
Database lookup
Great, and can you send me a confirmation receipt?
Done. I've sent the confirmation receipt to your email on file. Is there anything else I can help with?
Call resolved in 90 seconds. No human agent needed. Full audit trail recorded.

Tool Integration

Smarter Through Connection

Voice assistants that don't just talk — they act. Tools fire mid-conversation, enriching every response.

Live Conversation — Tool Call Flow

Knowledge Retrieval

Pull from KnowledgeX knowledge bases in real time. When a caller asks a question, the assistant queries your documents, databases, and semantic layer — then speaks the cited answer.

e.g., "What does our warranty policy say about water damage?"

Custom Tool Registry

Extend assistants with enterprise-specific actions. Register any API endpoint as a tool — CRM lookups, ticket creation, IoT commands — and the LLM decides when to call them during conversation.

e.g., "Create a support ticket for this issue and assign it to the on-call team"

Third-Party Agent Integration

Connect to external voice agent platforms like VAPI.ai. Manage credentials, configure agents, and run conversations through third-party providers — all from a single unified interface.

e.g., "Route overflow calls to the VAPI.ai customer support agent"

Under the Hood

The Real-Time Voice Pipeline

From microphone to AI-powered response — orchestrated by Pipecat in milliseconds.

Audio InWebRTC · Twilio
audio frames
Pipecat Pipeline
TRANSCRIBE

Speech-to-Text

Deepgram Nova-3
transcript
REASON

LLM Processing

GPT-5.2 · Ollama
tool calls
Tool Registry
Knowledge RAG
Client Events
Custom Tools
response text
SYNTHESIZE

Voice Synthesis

Cartesia
audio stream
Audio OutWebRTC Stream
Infrastructure Layer
PipecatPipeline framework
PostgreSQLState & history
MinIOAudio storage
LangfuseLLM tracing

Ecosystem

The DecisionOS Ecosystem

VoiceX is the voice intelligence layer — drawing on KnowledgeX for retrieval, ModelsX for local inference, and MonitoringX for full observability.

Your voice, understood.Your enterprise, empowered.

Create a voice assistant, upload a recording, or connect a phone line.

Real-time Streaming
Enterprise Security
Multi-provider Support
Full Audit Trail