
AI2025Layered Ops
Layered Ops — eval-driven RAG
A retrieval system over 12 years of incident reports, with a 200-case eval suite gating every prompt change.
Scroll for the work
AI products
Anthropic ClaudepgvectorCloudflare Workers
Their previous RAG system retrieved technically-correct-but-irrelevant documents 40% of the time. No way to tell if a prompt change made things better or worse.
Built an evaluation harness with 200 golden incidents, then used it to drive prompt + retrieval iteration. Every change runs the suite before merge.
Numbers that moved.
2025 · Layered Ops
0%
retrieval precision (was 60%)
0
cases in golden set
0×
faster prompt iteration

Next project · 02
Beta Inc.