B2B SaaS · YC-backed 2025 Live

Tier-1 Support Deflection Agent

Cut customer-support headcount 70% with a multi-language AI agent that deflects repetitive tickets at sub-2-second response time.

70%
Tier-1 deflected
<2s
Median response
$240K
Annual savings
Stack LangChainOpenAIPineconeNext.jsPostgres

The Problem

A YC-backed B2B SaaS company had a support team of 12 handling a flood of repetitive Tier-1 tickets — password resets, billing questions, feature how-tos. As they pushed toward a 3× user growth target, headcount was not an option.

65% of all tickets were answerable from the existing help-center docs. No judgment required. Just retrieval and a coherent reply.

What I Built

A production LangChain agent sitting in front of the existing Zendesk queue. The agent:

  1. Classifies incoming tickets by intent (can-answer vs. needs-human)
  2. Retrieves the most relevant help-center chunks from Pinecone (4M+ docs indexed)
  3. Drafts a response grounded strictly in retrieved content — no hallucination
  4. Posts the reply via Zendesk API with a confidence score
  5. Escalates anything below threshold to the human queue with a pre-filled context summary

Six languages supported via a detect-then-translate pipeline before and after retrieval.

Architecture Decisions

Why Pinecone over Chroma?

Scale. 4M documents with sub-100ms retrieval at P99 required a managed vector store. Pinecone’s metadata filtering also let us scope searches to the customer’s specific plan tier.

Pure vector search missed exact-match queries (“what is my plan limit?”). I layered BM25 keyword search (via Elasticsearch) with a reciprocal rank fusion step. Accuracy on the eval set jumped from 84% to 92%.

Confidence gating

The agent only auto-sends when cosine similarity > 0.87 AND the LLM’s self-reported confidence is “high”. Everything else gets a draft in the agent’s queue for one-click human approval. This kept the false-positive rate below 0.3%.

Results

  • 70% of Tier-1 volume deflected in week 3 (ramp-up period for eval tuning)
  • Median response time dropped from 4 hours to under 2 seconds
  • $240K/year in avoided headcount at their planned growth trajectory
  • CSAT held flat at 4.6/5 — customers couldn’t distinguish agent vs. human replies

What I’d Do Differently

The eval harness was built after the agent — it should have come first. We spent a week manually reviewing edge cases that a proper golden-set eval would have caught in hours.

← Previous WhatsApp Sales Qualification Bot Next → Multi-Agent Patient Intake Workflow

Want something
like this?

30 minutes, free, no deck. We'll figure out if I'm the right fit for your project.