← Blog·Case study·5 April 2026·10 min

From discovery to live: our 10-day AI agent deployment framework

What ten working days actually look like when we ship a production voice agent. Day-by-day.

ProcessOperationsMethodology

"Two weeks to go live" sounds like marketing. It is not. It is a constraint we built our process around because our clients kept telling us six-month consulting engagements were killing their patience and their budgets. Here is what actually happens across those ten working days.

Day one: discovery. Two hours with the operator. We do not talk about AI. We talk about which specific phone conversations or WhatsApp threads are eating their team's time. We pick one. The scope for the first agent is always one intent, one channel, one language pair. Feature creep kills two-week timelines.

Day two: data gathering. We need three things: existing SOPs or scripts the team uses (often there are none, which is itself useful to know), a recent sample of actual conversations (ten to twenty recordings or transcripts), and access to any system the agent needs to read from or write to (CRM, booking tool, knowledge base). We hand the operator a shared folder and a checklist. They return it by end-of-day-three.

Day three: architecture decisions. Which LLM. Which voice. Which infrastructure. Which handoff path for failed conversations. We produce a one-page design document that the operator signs off. Non-technical clients get an annotated version. This is where we prevent the most common failure: building something technically correct that the operator does not trust.

Day four and five: build. Prompts, knowledge base ingest, system integrations, voice tuning. We work in pairs — one person on the AI side, one on the integration side. The operator has a Slack channel and is online for questions. Most days have one or two fifteen-minute syncs.

Day six: internal test call. The agent is now live in a staging environment. We call it. We try to break it. We feed it ambiguous inputs, try to get it to hallucinate, check that the handoff path works when we mention something outside its scope. Bugs get fixed same-day.

Day seven: operator-led test. The operator calls it. Their team calls it. We observe. This is where assumptions about how the customer will phrase things meet reality. Almost always we adjust prompts on this day. Sometimes we discover that the team has an undocumented process the agent should know about.

Day eight: pilot. The agent handles ten to twenty real calls or threads, with a human reviewing every transcript end-of-day. Anything that felt off gets logged. We do not chase perfection here — we chase "no false certainty." If the agent does not know something, it says so and hands off. That is the bar.

Day nine: soft launch. The agent takes production traffic for a defined segment — one location, one time window, one campaign. The rest still routes to humans. We monitor conversion, CSAT (if we can measure it), and any escalations. Operator dashboard live.

Day ten: go-live review. Half-hour meeting. Numbers vs. expectations. Outstanding issues. Scope for agent v1.1. Renewal of operator's commitment to the weekly review cadence. Then we scale the traffic up.

What we never do in these ten days: promise features we have not built, skip the handoff path, or ship without a human-review queue. What we always do: keep the first intent small, document everything in the shared workspace, and force the operator to be a decision-maker, not a spectator.

The whole thing is a process, not a piece of software. The software is commoditising. The process is where the results come from.