Architecture

Understand the Voice AI Agent runtime.

Use this page to separate runtime traffic, control-plane configuration, telephony callbacks, AI media streams, and operational data before building production integrations.

Runtime plane

The runtime plane handles live phone traffic. It receives signed Twilio callbacks, starts outbound calls, opens realtime media streams, manages voice sessions, executes approved tools, and records operational state for review.

1

Caller or workflow

A customer calls a routed number, or your app creates an outbound call with tenant bearer auth.

2

StateSet Voice API

The API resolves tenant routing, validates credentials, returns TwiML, and creates session state.

3

Realtime AI session

Media streams connect to the configured model, voice, prompt, and tool policy.

4

Operations record

Call logs, voice sessions, function calls, evaluations, and callback tasks become auditable records.

Control plane

The control plane manages tenant configuration and production rollout. Keep it separate from runtime callers because it changes credentials, phone routes, agents, versions, rollout policy, and diagnostics.

ObjectManaged byProduction consideration
Tenant runtime configAdmin APIControls model, voice, prompt key, and tenant defaults.
Agent versionsAdmin APIPromote tested versions only after synthetic calls and evaluation review.
Phone routesAdmin API and TwilioMap numbers to the correct tenant, direction, and agent version.
Rollout governanceAdmin APIGate production traffic with evaluation thresholds and stop conditions.

Trust boundaries

Tenant boundary

Bearer runtime keys

Runtime keys should only access tenant-scoped calls, sessions, logs, and operations workflows.

Admin boundary

Admin API keys

Admin keys can change tenants, agents, routes, and rollout policy. Store them separately from runtime credentials.

Telephony boundary

Twilio signatures

Inbound callbacks must preserve the exact public URL and body before verification.

Media boundary

Stream tokens

Short-lived stream tokens constrain realtime media access during call setup.

Operational data model

RecordPurposeReview cadence
Call logBusiness-level call status, duration, direction, escalation, and summary.Every launch and support audit.
Voice sessionConversation transcript, function calls, metadata, and model/voice context.QA, debugging, and prompt iteration.
EvaluationQuality score, rubric, notes, and rollout-governance inputs.Before promotion and after major prompt changes.
Callback taskHuman follow-up queue, SLA, disposition, and notification workflow.Daily operations and SLA review.