Back to Engineering Portfolio
2026

shopify-agent

Autonomous store management through multi-agent orchestration, domain-specific evaluation loops, and tiered permission controls.

Claude Agent SDKPlaywrightBigQueryAWS CDKMCPTypeScript

The Problem

E-commerce store operations involve three categories of work: data retrieval (what's my revenue, what's low on stock), code quality enforcement (accessibility compliance, performance budgets, SEO), and decision-making (should I reorder this SKU, which pages need fixes). Traditional tooling addresses these in isolation — a separate dashboard for analytics, a separate CI step for linting, a separate spreadsheet for purchasing. The question I wanted to answer: can a single autonomous agent system handle all three, with appropriate guardrails for each?


System Design

The system is a CLI-driven orchestrator built on the Claude Agent SDK's query() function. A main agent receives natural language commands and delegates to three specialized subagents via the SDK's Task tool, each operating on a different model tier based on the complexity/cost tradeoff of their domain:

  • theme-auditor runs on Haiku — fast, cost-efficient, handles high-volume file scanning where the task is pattern recognition rather than deep reasoning
  • store-analytics runs on Sonnet — needs multi-step reasoning to translate natural language business questions into SQL, interpret results, and generate actionable recommendations
  • seo-optimizer runs on Sonnet — requires judgment calls about content quality and search intent that benefit from stronger reasoning

This isn't an arbitrary split. I profiled the token economics: a full theme audit touches 40-80 Liquid files. Running that on Sonnet would cost ~$2-4 per audit. On Haiku, it's ~$0.15. The analytics agent, by contrast, rarely processes more than a few thousand tokens of query results but needs to reason about what those numbers mean. Model selection per subagent is a cost/quality optimization, not a preference.


The Hard Problem: DOM-to-Source Mapping

The most technically interesting piece is the audit pipeline's auto-fix loop. The challenge: axe-core reports accessibility violations against rendered DOM elements, but the source code that generated those elements is Liquid templates — a server-side templating language where an <img> missing alt text might originate from a {% for image in product.images %} loop in snippets/product-media-gallery.liquid.

Mapping a browser DOM violation back to the correct Liquid template file requires understanding Shopify's rendering pipeline. The system uses a four-strategy resolution chain:

  1. Section ID extraction — Shopify wraps each section's output in #shopify-section-{name}, so #shopify-section-featured-collection maps directly to sections/featured-collection.liquid
  2. Structural knowledge — a product page violation likely lives in sections/main-product.liquid or its known snippet dependencies
  3. HTML fragment grep — extract a unique class or attribute from the violating HTML and search the Liquid codebase
  4. Page-type heuristics — fall back to the most probable template for the current page type

Each violation is then classified into a confidence tier: auto-fix (well-understood patterns like missing alt attributes), suggest-and-apply (requires human review, like color contrast adjustments), or manual-only (too risky to automate, like heading hierarchy restructuring). The subagent only acts on fixable items, applies patches using Liquid-aware context (e.g., alt="{{ image.alt | default: product.title }}" instead of hardcoded strings), then verifies each fix by running shopify theme check. Failed verifications trigger a retry loop, capped at 3 attempts per violation.

This is the design principle I'd bring to any AI system that modifies production assets: classify confidence before acting, verify after acting, and know when to stop.


Permission Architecture

The agent operates on a defense-in-depth model with three independent layers:

Layer 1 — CLI flags. The user explicitly opts into risk levels: --allow-writes enables file modifications, --allow-mutations enables Shopify API state changes, --dangerous enables destructive operations. Default is read-only.

Layer 2 — canUseTool callback. Every tool invocation passes through a runtime permission check before execution. This gates MCP tool access (mutation tools require the flag), file writes (critical files are always blocked), and bash commands (destructive patterns like rm -rf and shopify theme publish require explicit opt-in).

Layer 3 — PreToolUse hooks. Even if canUseTool somehow passes, hooks provide a second check specifically for critical file protection (.env, shopify.app.toml, config/settings_data.json). This catches subagent attempts that bypass the main orchestrator's permission check — defense-in-depth means no single layer is trusted alone.

Additionally, the BigQuery tool enforces SELECT-only queries with a 1GB scan cost cap and dry-run support, and GraphQL mutations are restricted to a 6-mutation allowlist regardless of permission flags.

Every tool invocation is logged to a JSONL audit trail via PostToolUse hooks. SubagentStop hooks capture each subagent's final output for a consolidated session report.


Data Architecture

The system interfaces with two data paths depending on query characteristics:

Real-time path — Shopify Admin GraphQL API via 7 custom MCP tools. Used for single-entity lookups (specific order, current inventory level) where freshness matters.

Warehouse path — BigQuery with three datasets:

  • shopify_warehouse (custom pipeline): 8 tables with partition/cluster optimization, 5 analytics views
  • marketing_warehouse (Fivetran-managed): GA4, Klaviyo, Meta Ads, Google Ads, Search Console
  • unified_analytics (cross-source joins): 6 views that combine store operations with marketing attribution

The analytics subagent's system prompt includes explicit routing guidance: "What's my revenue?" routes to the custom warehouse, "What's my ROAS?" routes to unified views that use GA4 as the attribution source rather than platform-reported vanity metrics. This encoding of domain expertise into the prompt is deliberate — the agent shouldn't need to reason about data architecture on every query.

The ingestion pipeline is AWS serverless (CDK-defined): Shopify webhooks → API Gateway (HMAC verification) → Lambda (DynamoDB deduplication) → SQS FIFO → Lambda (BigQuery batch insert). A daily EventBridge-triggered full sync catches any webhook gaps.


The Restock Agent

The most autonomous component: a daily Lambda that queries BigQuery for 30-day sales velocity and current inventory, groups SKUs by supplier, fetches reorder configuration from Shopify product metafields, then sends each supplier group to Claude Haiku for evaluation.

Haiku doesn't just check "is stock below threshold." It evaluates seasonality, minimum order quantities, velocity trends, and lead times to decide whether to reorder, how much, and with what confidence. The output is a structured purchase order with per-SKU reasoning and a confidence score. In production mode, POs are emailed to suppliers via SES.

The key design choice: the AI makes the decision, but the system enforces idempotency. A last_po_date metafield on each product prevents duplicate orders. The entire pipeline costs approximately $0.27/month in infrastructure and AI tokens.


What I'd Do Differently

The DOM-to-Liquid mapper uses synchronous execSync for grep operations. In a production system I'd move to async workers with a file index cache. The fix classification is rule-based — a learned model trained on historical fix success/failure would be more accurate. And the subagent prompt engineering, while effective, is fragile — changes to Shopify's theme architecture would require prompt updates rather than being discovered dynamically.

These are the tradeoffs of building a working system quickly versus building a system that scales to thousands of stores. Both are valid engineering contexts.


Numbers

MetricValue
Application code2,362 lines TypeScript
Test coverage115 tests across 7 test files
MCP tools8 (7 Shopify + 1 BigQuery)
Subagents3 (theme on Haiku, analytics + SEO on Sonnet)
InfrastructureAPI Gateway, 4 Lambdas, SQS FIFO, DynamoDB, S3, EventBridge, SES
Data warehouse8 tables, 11 views, 6 cross-source views

Stack

Claude Agent SDK · Anthropic API · Shopify Plus Admin GraphQL · Playwright · axe-core · web-vitals · Google BigQuery · Fivetran · AWS CDK (Lambda, API Gateway, SQS, DynamoDB, S3, EventBridge, SES, Secrets Manager) · MCP SDK · Commander.js · Zod · Node.js native test runner