Headless Commerce March 26, 2026 13 min read

How We Built an AI Assistant That Knows Every Product in the Catalog

No RAG. No vector database. No embeddings. The full product catalog fits in the system prompt. Here is how we built an AI sales assistant that actually works.

Tyler Colby · Founder, Colby's Data Movers

Most AI commerce implementations start with a vector database. Embed the product descriptions. Index them in Pinecone or Weaviate. Use RAG to retrieve relevant products for each query. It is the standard architecture. It is also unnecessary for most Shopify stores.

Our client has 92 products. The full catalog, formatted as structured text with titles, descriptions, prices, variants, tags, and availability, is about 15,000 tokens. Claude Sonnet 4 has a 200K token context window. The entire catalog fits in the system prompt with room to spare. No retrieval step needed. No embedding pipeline. No vector database to host, pay for, and maintain.

The assistant knows every product because it has seen every product. Every time. On every request.

The Scale Decision

Let me be precise about when this approach works and when it does not.

At 92 products, the catalog is about 15K tokens. At 200 products, it is roughly 35K tokens. At 500 products, you are looking at 80-90K tokens. All of these fit in Claude's context window with plenty of room for conversation history.

The cost scales linearly. At $3/million input tokens for Claude Sonnet 4, a 15K-token system prompt costs $0.045 per 1,000 requests. For 500 products at 90K tokens, it is $0.27 per 1,000 requests. That is 27 cents for a thousand AI-assisted shopping conversations. Even at heavy usage, the monthly cost is negligible.

The break-even point where RAG becomes cheaper than full-context is somewhere around 1,000-2,000 products. Below that threshold, the operational complexity of a vector database (embedding pipeline, index management, retrieval tuning, re-indexing on product changes) costs more in engineering time than the extra tokens cost in API fees.

We optimize for simplicity. The catalog goes in the prompt.

The System Prompt

The system prompt is the core of the assistant. It is templated in the config file and hydrated at runtime with fresh product data. Here is the structure:

function buildSystemPrompt(
  config: CartridgeConfig,
  products: Product[]
): string {
  const catalogText = products.map(p => {
    const variants = p.variants.map(v =>
      `  - ${v.title}: $${v.price}${v.compareAtPrice ? ` (was $${v.compareAtPrice})` : ''} ${v.available ? '(In Stock)' : '(Sold Out)'}`
    ).join('\n');

    return `## ${p.title}
Handle: ${p.handle}
Type: ${p.productType}
Tags: ${p.tags.join(', ')}
${p.description.replace(/<[^>]*>/g, '').slice(0, 200)}
Variants:
${variants}`;
  }).join('\n\n');

  return config.ai.systemPromptTemplate
    .replace('{{store.name}}', config.store.name)
    .replace('{{ai.personality}}', config.ai.personality)
    .replace('{{catalog_summary}}', `${products.length} products across ${new Set(products.map(p => p.productType)).size} categories`)
    .replace('{{product_catalog}}', catalogText);
}

The template from the config file looks like this:

systemPromptTemplate: `You are the shopping assistant for {{store.name}}.
Your personality: {{ai.personality}}.

{{store.name}} sells {{catalog_summary}}.

COMPLETE PRODUCT CATALOG:
{{product_catalog}}

RULES:
1. Only recommend products from the catalog above. Never invent products.
2. Always include the exact price when mentioning a product.
3. When a product has a compare_at_price, mention the savings.
4. If a variant is sold out, say so and suggest alternatives.
5. If asked about something we do not sell, be honest and helpful.
6. Keep responses concise. Two to three sentences for simple questions.
7. For complex questions, use structured lists.
8. When you sense buying intent, use the searchProducts tool to show product cards.
9. If the customer seems ready for a large or custom order, use createQuoteRequest.
10. If the conversation is going well but no purchase, use captureEmail to offer updates.`

A few design decisions here worth explaining.

The product descriptions are stripped of HTML and truncated to 200 characters. The full description is available via the search tool if needed, but for the system prompt, a short summary keeps the token count manageable.

Tags are included because they capture attributes the description might not mention. A product tagged "gift-under-100" lets the assistant answer "What is a good gift under $100?" without parsing prices.

Compare-at prices are included so the assistant can proactively mention sales. "That bag is currently $289, down from $349. You save $60."

The Three Tools

The assistant has three tools it can call. These are not hypothetical. They are Vercel AI SDK tool definitions that execute real code on the server.

Tool 1: searchProducts

const searchProducts = tool({
  description: 'Search the product catalog and return matching products as interactive cards. Use when the customer asks about specific products or categories.',
  parameters: z.object({
    query: z.string().describe('Search query'),
    maxResults: z.number().default(4).describe('Maximum results to return'),
  }),
  execute: async ({ query, maxResults }) => {
    const results = products
      .map(p => ({ product: p, score: scoreMatch(query, p) }))
      .filter(r => r.score > 0)
      .sort((a, b) => b.score - a.score)
      .slice(0, maxResults);

    return results.map(r => ({
      title: r.product.title,
      handle: r.product.handle,
      price: r.product.variants[0].price,
      compareAtPrice: r.product.variants[0].compareAtPrice,
      image: r.product.images[0]?.src,
      available: r.product.variants.some(v => v.available),
      url: `/products/${r.product.handle}`,
    }));
  },
});

When the assistant calls this tool, the frontend renders the results as interactive product cards inline in the chat. Each card has an image, title, price, and "View Product" link. The customer can browse without leaving the conversation.

The scoring function is the same one used for the Cmd+K search. Title matches score highest. Tag matches next. Description matches lowest. This means the assistant's product recommendations are ranked the same way the search bar ranks results. Consistent behavior across interfaces.

Tool 2: createQuoteRequest

const createQuoteRequest = tool({
  description: 'Create a quote request for bulk or custom orders. Use when the customer indicates they want to buy in quantity or needs custom specifications.',
  parameters: z.object({
    customerName: z.string(),
    customerEmail: z.string().email(),
    products: z.array(z.object({
      handle: z.string(),
      quantity: z.number(),
      notes: z.string().optional(),
    })),
    message: z.string().optional(),
  }),
  execute: async ({ customerName, customerEmail, products, message }) => {
    // Create HubSpot contact and deal
    await hubspot.createContact({
      email: customerEmail,
      firstname: customerName.split(' ')[0],
      lastname: customerName.split(' ').slice(1).join(' '),
    });

    await hubspot.createDeal({
      dealname: `Quote: ${products.map(p => p.handle).join(', ')}`,
      pipeline: 'default',
      dealstage: 'qualifiedtobuy',
      properties: {
        description: JSON.stringify({ products, message }),
      },
    });

    return { success: true, message: 'Quote request submitted. We will follow up within 24 hours.' };
  },
});

This tool captures structured lead data and pushes it to HubSpot. The assistant gathers the information conversationally. "What is your name?" "And your email?" "How many units are you looking at?" Each answer fills a parameter. When all required parameters are collected, the tool executes. The customer gets a confirmation. The sales team gets a qualified lead with product details.

Tool 3: captureEmail

const captureEmail = tool({
  description: 'Capture email for newsletter, restock notifications, or follow-up. Use when the conversation is positive but the customer is not ready to buy.',
  parameters: z.object({
    email: z.string().email(),
    reason: z.enum(['newsletter', 'restock', 'followup', 'discount']),
    productInterest: z.array(z.string()).optional(),
  }),
  execute: async ({ email, reason, productInterest }) => {
    await marketing.subscribe({
      email,
      tags: [reason, ...(productInterest || [])],
      source: 'ai-assistant',
    });

    const messages = {
      newsletter: 'You are on the list. We send updates about once a month.',
      restock: 'We will email you when those items are back in stock.',
      followup: 'We will send you a follow-up with more details.',
      discount: 'Check your inbox for a welcome discount.',
    };

    return { success: true, message: messages[reason] };
  },
});

The reason parameter lets the assistant match the email capture to the conversation context. If the customer asked about a sold-out product, the reason is "restock." If they were browsing but not buying, it is "followup." The email is tagged with product interest data so the marketing platform can segment later.

The Streaming Implementation

Streaming is not optional. A non-streaming AI response takes 2-5 seconds to generate. During that time, the customer sees nothing. They do not know if the system is working. They close the tab.

With streaming, the first token appears in 200-400ms. The response builds word by word. The customer reads as the assistant writes. The perceived latency drops from seconds to milliseconds.

We use the Vercel AI SDK v6 for streaming. The server-side route:

// app/api/chat/route.ts
import { streamText } from 'ai';
import { anthropic } from '@ai-sdk/anthropic';

export async function POST(req: Request) {
  const { messages } = await req.json();

  // Rate limiting check
  const clientIp = req.headers.get('x-forwarded-for') || 'unknown';
  const rateLimitResult = await checkRateLimit(clientIp, config.ai.rateLimit);
  if (!rateLimitResult.allowed) {
    return new Response('Rate limit exceeded', { status: 429 });
  }

  const systemPrompt = buildSystemPrompt(config, products);

  const result = streamText({
    model: anthropic('claude-sonnet-4-20250514'),
    system: systemPrompt,
    messages,
    tools: {
      searchProducts,
      createQuoteRequest,
      captureEmail,
    },
    maxSteps: 3,
  });

  return result.toDataStreamResponse();
}

The client-side hook:

// components/AiChat.tsx
import { useChat } from 'ai/react';

function AiChat() {
  const { messages, input, handleInputChange, handleSubmit, isLoading } = useChat({
    api: '/api/chat',
    initialMessages: [{
      id: 'greeting',
      role: 'assistant',
      content: config.ai.greeting,
    }],
  });

  return (
    <div className="ai-chat">
      <div className="messages">
        {messages.map(m => (
          <Message key={m.id} message={m} />
        ))}
      </div>
      <form onSubmit={handleSubmit}>
        <input
          value={input}
          onChange={handleInputChange}
          placeholder="Ask about our products..."
          disabled={isLoading}
        />
      </form>
    </div>
  );
}

The useChat hook from the AI SDK handles the streaming protocol, message state, loading states, and tool result rendering. The Message component checks for tool invocations and renders product cards, quote confirmations, or email capture results inline with the text.

Rate Limiting

Without rate limiting, a single bad actor can run up your API bill. The rate limiter is configured in the config file:

rateLimit: {
  maxRequestsPerMinute: 10,
  maxRequestsPerSession: 50
}

We use a sliding window counter stored in a KV store (Vercel KV for hosted deployments, in-memory Map for development). The IP-based limiter prevents abuse. The session-based limiter prevents a single conversation from consuming excessive tokens.

When the limit is hit, the assistant responds with a friendly message: "I have been answering a lot of questions. For more help, feel free to email us at [contact email]." No error screen. No cryptic 429 response. A human-readable redirect to another channel.

The Mock Fallback

Not every merchant wants to pay for an AI API key during development or testing. The assistant has a mock mode that activates when no API key is configured:

// lib/ai-provider.ts
export function getAiProvider() {
  if (!process.env.ANTHROPIC_API_KEY) {
    return mockProvider();
  }
  return anthropic('claude-sonnet-4-20250514');
}

function mockProvider() {
  return {
    async *stream(prompt: string) {
      const responses = [
        "I'm currently in demo mode. In production, I'd search our full catalog to find exactly what you're looking for.",
        "Great question! When connected to the live catalog, I can show you specific products with prices and availability.",
        "I'd love to help with that. In the live version, I can search products, create quotes, and more.",
      ];
      const response = responses[Math.floor(Math.random() * responses.length)];
      for (const char of response) {
        yield char;
        await new Promise(r => setTimeout(r, 20));
      }
    }
  };
}

The mock provider simulates streaming by yielding characters with a 20ms delay. The UI behaves identically. The chat widget renders the same way. The only difference is the content. This lets merchants evaluate the UX without an API key, and developers test the frontend without spending money.

What We Learned

After three months in production, here is what the data shows.

Average conversation length: 4.2 messages. Customers ask a question, get a product recommendation, ask a follow-up, and either click through to a product or leave. Short conversations. Specific queries. The assistant is used like search, not like a chatbot.

Top queries: "What do you have under $X?" "Is [product] in stock?" "What is the difference between [A] and [B]?" "What would you recommend for [use case]?" These are the same questions a sales associate would answer in a physical store.

Tool usage: searchProducts is called on 73% of conversations. captureEmail on 12%. createQuoteRequest on 4%. The search tool is by far the most valuable because it converts the AI response from text into interactive product cards.

Cost: At roughly 8,000 messages per month, the total Anthropic API cost is about $18. That is less than a single Shopify app subscription.

Conversion impact: Visitors who interact with the AI assistant have a 3.2x higher conversion rate than those who do not. This is correlation, not causation. Visitors who engage with the assistant were probably more purchase-intent to begin with. But the data is encouraging.

The architecture is simple. The catalog goes in the prompt. The tools handle actions. The streaming makes it feel fast. The config file controls everything. For stores with fewer than 500 products, this is the right approach. Skip the vector database. Skip the RAG pipeline. Put the data where the model can see it.

Next: how we extracted all of this into a product.