The numbers
At Tailor, we run a B2B SaaS CRM for wholesale fashion brands. We have 2,000+ daily users across 100+ brands on the platform — and WhatsApp Business is the operating system of the Brazilian B2B sales floor.
Today we process around 11 million messages a month across our customers, plus ~800K outbound broadcasts a month via the API. Some customers connect 120+ WhatsApp numbers to a single tenant — regional sales reps, separate funnels, sub-brands. Multi-number is not an edge case; it is the shape of the problem.
Coexistence: don't force it on the bots
Meta launched coexistence so the WhatsApp Business API and the native WhatsApp app can run on the same number. The reflex is to turn it on everywhere. We did the opposite.
The asymmetric rule we landed on:
- AI agents run with coexistence OFF. One device, one number, no device contention. No race conditions where two devices fight over the same inbound message and one of them silently drops it.
- Human operators stay ON coexistence. They get the convenience — multi-device, history search, fast handoff — and the API still routes cleanly through the agent path.
This is not in the Meta docs. It is the kind of decision you only make once you have shipped a few thousand lost messages and watched a queue start eating itself because two devices were arguing over the same inbound.
Multi-number is a first-class concept
When a single customer connects 120 numbers, the routing layer cannot treat phone-number as metadata. It is the partition key.
That decision flows into everything: campaign approvals are per number. Delivery aggregation is per number, per agent, per campaign. Conversation context is isolated per number so a customer talking to two reps does not collide. The boring infrastructure that lets the interesting layers above it stay simple.
The AI agent layer
We run a multi-agent architecture on top of all this, built on Google ADK (Agent Development Kit). Specialized sub-agents per job — qualification, support, follow-up, catalog, scheduling — with a router that decides which one handles the next turn.
A mix of models, picked per job. Cost, latency, and reasoning depth all vary. The cheap model handles routing and short replies. The expensive one handles ambiguous qualification and PDF/image/audio reading.
What the agents actually do, in production:
- Qualify a lead by asking the right three questions
- Answer product questions from the catalog
- Book a visit on Google Calendar
- Send the right catalog cut to the right buyer
- Follow up on stalled conversations
- Suggest alternatives when something is out of stock
- Read PDF, image, and audio attachments inline
Every agent change goes through an eval suite before production — golden conversations across realistic scenarios (lead capture, retail, returns). Observability is Phoenix/Arize over OpenTelemetry, so every step of every agent run is traced.
What "production" actually means here
The AI layer above only looks magical because the boring layer below holds. The boring layer is:
- Idempotent inbound. Meta retries when your webhook hiccups. Your handler must not double-charge a customer or re-send a message.
- Status callback fan-out. Sent, delivered, read, failed are separate events. They arrive out of order. Your state machine has to tolerate that.
- Session window discipline. The 24-hour rule is real. Outside the window, templates only. Forget this and you get suspended.
- Backpressure on broadcasts. 800K outbound messages do not happen by hitting the API in a loop. Queues, rate-limit awareness, retry with jitter, dead-letter for templates that get rejected.
None of this is glamorous. All of it is what separates a demo from a system.
The mental model
WhatsApp Business is not email. It is not a contact-center tool. It is a customer-facing API surface that happens to look like a chat to the human on the other end. Once you build with that mental model, the architecture stops fighting you.
The interesting work is on the boring parts.