Vapi Voice Agent | Definition | Night Watch Glossary

Vapi (vapi.ai) is a developer platform for building real-time voice AI agents. It handles the low-level pieces of a voice conversation — speech-to-text, language model invocation, text-to-speech, interruption handling, end-of-speech detection — so application code can focus on the agent’s behavior and tool calls. Night Watch’s voice agent runs on Vapi.

How it works

A Vapi call connects audio from a telephony provider (Twilio in Night Watch’s case) to a streaming pipeline. Inbound audio is transcribed in real time, the transcribed text is passed to a language model with the agent’s system prompt and tool definitions, the model’s response is synthesized back to speech, and the synthesized audio is played to the caller. The pipeline supports interruptions, partial utterances, and tool calls (so the model can hit a calendar API, a weather API, or a database mid-conversation).

Why it matters

Building a production voice agent from scratch is hard. Latency budgets are tight (a few hundred milliseconds between “the caller stops speaking” and “the agent starts replying” is the difference between natural and awkward). Vapi has solved the streaming, interruption, and turn-taking problems generically so application teams do not have to rebuild them. For Night Watch, that means the engineering team can focus on trade-specific triage logic and dispatch behavior instead of speech infrastructure.

How Night Watch implements it

Night Watch runs on Vapi with Twilio carrier-grade telephony. The Vapi assistant is configured with a trade-specific system prompt, tool definitions for calendar access, dispatch initiation, customer history retrieval, and weather data; and a voice profile that sounds natural without being uncanny. Every Vapi call is recorded and stored in Supabase Storage with field-level PII encryption.