Decision Layer

VoiceX

Enterprise Voice,
Amplified by AI

Real-time speech-to-text, intelligent reasoning, and natural voice synthesis — woven into a single pipeline for voice assistants, transcription, and phone integration.

3 AI Layers|Real-time Streaming|Speaker Recognition

Create a Voice Assistant

The Opportunity

The Rise of Voice-First Enterprise AI

The fastest human interface

Voice Is How Humans Think

Speech is 3x faster than typing and requires zero training. Enterprises are shifting from type-to-query to speak-to-act — unlocking AI for frontline workers, field technicians, and executives who never open a laptop.

Untapped enterprise intelligence

Audio Data Is Invisible Gold

Meetings, customer calls, and field recordings contain critical decisions, commitments, and expertise — trapped in audio files. Organizations that transcribe, diarize, and index voice data gain a competitive edge.

Latency is the enemy

Real-Time Decisions Need Real-Time AI

A support agent can't wait 30 seconds for an answer. A field technician can't type while holding equipment. Voice AI that streams responses in real time transforms workflows that batch processing can't touch.

Text-based interfaces limit AI adoption to desk workers. Voice removes that barrier — making AI accessible to every role, in every environment, at the speed of conversation.

The Architecture

Three Layers, One Real-Time Pipeline

Audio flows through three AI layers — each specialized, all streaming, fully configurable.

Audio Input Stream

EARS

Speech-to-Text

Real-time speech recognition with sub-300ms latency. Supports speaker diarization, custom vocabulary, and 30+ languages.

Providers

Deepgram Nova-3AssemblyAIAzure Speech

Capabilities

Speaker diarizationCustom vocabularyLive streamingMulti-language

TRANSCRIPT

BRAIN

LLM + Tools

The reasoning layer. Processes transcribed speech, calls tools, retrieves knowledge, and formulates responses — all within the conversation flow.

Providers

OpenAI GPT-4oLocal via OllamaAny OpenAI-compatible

Capabilities

Tool orchestrationKnowledge retrievalCustom toolsSystem prompts

RESPONSE TEXT

VOICE

Text-to-Speech

Premium voice synthesis with natural prosody. Low-latency streaming output that begins speaking before the full response is generated.

Providers

CartesiaGender/language filtersVoice selection

Capabilities

Streaming synthesisNatural prosodyVoice cloning readyLow latency

Voice Output Stream

Capabilities

One Platform, Three Modes

Create conversational AI voice agents with configurable speech recognition, language models, and voice synthesis. Real-time bidirectional streaming via WebSocket.

Configurable STT, LLM, and TTS providers

Custom system prompts and personalities

Tool integration (knowledge retrieval, custom tools)

Real-time WebSocket audio streaming

Conversation history and recordings

Guardrails and safety controls

In Practice

VoiceX in Action

Real scenarios. Real conversations. Real outcomes.

Field Service

Hands-Free Equipment Diagnostics

A maintenance technician is troubleshooting a hydraulic press on the factory floor. Both hands are occupied.

What's the reset procedure for a Komatsu HBP-110 hydraulic press showing error code E-47?

E-47 on the HBP-110 indicates a hydraulic fluid pressure sensor fault. Here's the reset sequence:

KnowledgeX retrieval

1. Power down the main controller. 2. Check the pressure sensor cable at connector J4. 3. Wait 30 seconds, then restart in diagnostic mode by holding the blue button during power-on.

The connector looks corroded. Should I replace it?

Yes — connector corrosion is the most common cause of E-47. The part number is KPC-J4-R2. I've logged this as a maintenance event for your shift report.

Custom tool

Issue resolved in 4 minutes without leaving the equipment. Maintenance logged automatically.

Meeting Intelligence

Automated Meeting Analysis

A 45-minute leadership sync is recorded. The team needs a summary, action items, and speaker attribution.

[Audio uploaded: leadership-sync-feb-24.wav]

Transcription complete. 3 speakers identified:

Sarah Chen (VP Operations) — 18 min Marc Dubois (CFO) — 15 min Priya Rao (CTO) — 12 min

Key decisions: • Q3 headcount freeze approved • Cloud migration timeline moved to September • Budget reallocation of $2.4M from CapEx to OpEx

Action items: 1. Sarah: Draft hiring freeze communication by Friday 2. Marc: Prepare revised budget model by March 3 3. Priya: Scope September migration plan

Full transcript with speaker labels, 3 decisions captured, 3 action items assigned.

Customer Support

Intelligent Call Center Agent

An inbound call arrives at a financial services company. The VoiceX assistant handles initial triage before escalating if needed.

Thank you for calling Meridian Financial. I'm your AI assistant. How can I help you today?

I need to check the status of my wire transfer from last week.

I can help with that. Could you please verify your account number and the date of the transfer?

Identity verification

Account 4829-7731, sent last Thursday.

I've located your transfer of $12,500 to Chase Bank, initiated February 20th. Status: completed — funds were received on February 21st at 2:14 PM EST.

Database lookup

Great, and can you send me a confirmation receipt?

Done. I've sent the confirmation receipt to your email on file. Is there anything else I can help with?

Call resolved in 90 seconds. No human agent needed. Full audit trail recorded.

Tool Integration

Smarter Through Connection

Voice assistants that don't just talk — they act. Tools fire mid-conversation, enriching every response.

Live Conversation — Tool Call Flow

Knowledge Retrieval

Pull from KnowledgeX knowledge bases in real time. When a caller asks a question, the assistant queries your documents, databases, and semantic layer — then speaks the cited answer.

e.g., "What does our warranty policy say about water damage?"

Custom Tool Registry

Extend assistants with enterprise-specific actions. Register any API endpoint as a tool — CRM lookups, ticket creation, IoT commands — and the LLM decides when to call them during conversation.

e.g., "Create a support ticket for this issue and assign it to the on-call team"

Third-Party Agent Integration

Connect to external voice agent platforms like VAPI.ai. Manage credentials, configure agents, and run conversations through third-party providers — all from a single unified interface.

e.g., "Route overflow calls to the VAPI.ai customer support agent"

Under the Hood

The Real-Time Voice Pipeline

From microphone to AI-powered response — orchestrated by Pipecat in milliseconds.

Audio In

WebRTC · Twilio

audio frames

Pipecat Pipeline

TRANSCRIBE

Speech-to-Text

Deepgram Nova-3

transcript

REASON

LLM Processing

GPT-5.2 · Ollama

response text

SYNTHESIZE

Voice Synthesis

Cartesia

audio stream

Audio Out

WebRTC Stream

tool calls

Tool Registry

Knowledge RAG

search_knowledge_base

Client Events

send_event

Custom Tools

User-defined

Audio InWebRTC · Twilio

audio frames

Pipecat Pipeline

TRANSCRIBE

Speech-to-Text

Deepgram Nova-3

transcript

REASON

LLM Processing

GPT-5.2 · Ollama

tool calls

Tool Registry

Knowledge RAG

Client Events

Custom Tools

response text

SYNTHESIZE

Voice Synthesis

Cartesia

audio stream

Audio OutWebRTC Stream

Infrastructure Layer

PipecatPipeline framework

PostgreSQLState & history

MinIOAudio storage

LangfuseLLM tracing

Ecosystem

The DecisionOS Ecosystem

VoiceX is the voice intelligence layer — drawing on KnowledgeX for retrieval, ModelsX for local inference, and MonitoringX for full observability.

VoiceX

Voice Intelligence Layer

VoiceX

Voice Intelligence Layer

Your voice, understood.
Your enterprise, empowered.

Create a voice assistant, upload a recording, or connect a phone line.

Create Your First Assistant

Real-time Streaming

Enterprise Security

Multi-provider Support

Full Audit Trail

Enterprise Voice,Amplified by AI

The Rise of Voice-First Enterprise AI

Voice Is How Humans Think

Audio Data Is Invisible Gold

Real-Time Decisions Need Real-Time AI

Three Layers, One Real-Time Pipeline

Speech-to-Text

LLM + Tools

Text-to-Speech

One Platform, Three Modes

VoiceX in Action

Hands-Free Equipment Diagnostics

Automated Meeting Analysis

Intelligent Call Center Agent

Smarter Through Connection

Knowledge Retrieval

Custom Tool Registry

Third-Party Agent Integration

The Real-Time Voice Pipeline

Speech-to-Text

LLM Processing

Voice Synthesis

Speech-to-Text

LLM Processing

Voice Synthesis

The DecisionOS Ecosystem

Your voice, understood.Your enterprise, empowered.

Enterprise Voice,
Amplified by AI

Your voice, understood.
Your enterprise, empowered.