Sample audit
AI Integration Audit —an excerpt.
Disclaimer. Fabricated example. Real audits are confidential and tailored to your codebase — this page just shows what 5 days of senior engineering review actually produces.
Audit for Loop, a hypothetical Shopify-app SaaS adding AI product description generation. Two of the five deliverable sections shown in full; the rest summarized.
Metadata
- Client
- Loop (hypothetical example)
- Stack
- Next.js 14, Node 20, Postgres, Shopify Storefront API
- MAU
- ~2,400 active stores
- Audit window
- 5 days, async
- Author
- Robin Solanki
Executive summary
Loop has a clean codebase ready for AI integration. Of three candidate features the team is considering, “AI product description generation” is the highest-value lowest-cost ship. Recommended architecture uses Claude Haiku for cost efficiency at this scale; expected per-store cost ~$0.40/month at p95 usage. Estimated 2-week build to staging, 3 weeks to production.
Section 1 of 5 — full
Codebase review
Architecture today.
Next.js App Router with API routes calling Postgres directly via Prisma. Background jobs in BullMQ on a Redis side-car. Frontend renders product description previews via SSR. No existing LLM dependencies.
Where AI fits cleanly.
A new API route at /api/generate-description taking product attributes (title, category, materials, price tier) and returning streaming text. Frontend already has a description editor — just needs a “Generate” button wired to the new route.
Refactors needed first.
Two:
- Move the existing description-validation logic out of the React component into a shared
lib/description.tsso the AI-generated output passes through the same validators (banned words, length limits, locale rules). - The current Prisma schema has description as a single text column. Add a
descriptionGeneratedByenum ('human' | 'ai' | 'edited-ai') and adescriptionGeneratedAttimestamp for telemetry.
Refactors to skip.
The team is considering moving to a microservices architecture before adding AI. Skip — there’s no microservices benefit at 2,400 MAU. Add the feature in the monolith first.
Existing infra you can reuse.
BullMQ for batch generation jobs (e.g. “regenerate all descriptions in this category”). Redis for response caching keyed by product attribute hash.
Section 2 of 5 — full
Feature prioritization
Candidate 1: AI product description generation
- User value
- High — saves merchants 15–30 min per product
- Engineering cost
- 2 weeks (one engineer)
- LLM cost
- ~$0.40/store/month at p95 (1,500 generations/month/store, Haiku)
- Risk
- Low — generations are reviewed before publish, so model errors don't reach end-customers
Candidate 2: Personalized email subject lines
- User value
- Medium — estimated 8–12% open-rate lift based on industry benchmarks
- Engineering cost
- 3 weeks (touches the email-sending path which has more edge cases)
- LLM cost
- ~$1.20/store/month at p95
- Risk
- Medium — bad subject lines actually go to customers
Candidate 3: AI customer support chatbot
- User value
- Low for Loop's segment (small Shopify stores rarely have support volume)
- Engineering cost
- 6+ weeks (RAG over store inventory, conversation memory, escalation paths)
- LLM cost
- Highly variable, $5–20/store/month
- Risk
- High — bad answers go directly to paying customers
Recommendation
Build Candidate 1 first as a 2-week sprint. Revisit Candidate 2 in Q3 once you have generation telemetry. Park Candidate 3 indefinitely.
Sections 3–5 — summarized
Remaining sections
Section 3 — Architecture for the top recommendation
Full audit shows Claude Haiku vs GPT-4o-mini cost math, prompt + chain design, eval strategy with a 200-item golden set, fallback when the API rate-limits, where to cache vs not.
Section 4 — Cost projection
Full audit shows per-request cost at p50/p95/p99, monthly cost at current 2,400 MAU, projected cost at 10,000 MAU, when self-hosting becomes cheaper than API calls — answer: ~50,000 MAU based on current rates.
Section 5 — Build sequence
Full audit shows week-by-week tasks: prompt drafting and eval setup → MVP API route → frontend wiring → batch regeneration → telemetry dashboards. Plus risk register: rate-limit blowouts, prompt injection through merchant input, output drift after model updates.
What the full audit looks like
This is roughly 30% of what a real audit looks like. The full doc is 12–15 pages, indexed by section, with code references and architecture diagrams. Services are currently paused — talk to me and we’ll figure out timing.