Writing
Practical thinking on AI workflows in production — what breaks, how to catch it, and what the fix looks like. No hype. No vendor comparisons.
How to Monitor LLM Calls in Production: A Complete Setup Guide
Standard infrastructure monitoring tells you the service is up. It doesn't tell you whether the model is producing correct outputs, whether latency is acceptable at p95, or whether costs are tracking. Here's the complete setup: what to instrument, what metrics to track, and which tools to use.
The Real Cost of Running Unmonitored AI in Production
The team ships the AI feature. It works in staging. Production looks clean. But the outputs have been wrong at 12% since launch. Costs are running 3x the estimate. Nobody knows yet. This is the unmonitored AI problem — and this post quantifies what it actually costs.
Why Your AI Eval Suite Isn't Enough (And What's Missing)
Most eval suites have happy-path bias, don't block deployment, lack regression testing, and go stale. An eval suite with these gaps can report 94% accuracy while missing a 15% failure rate on real production inputs. Here are the six gaps we see most often — and what to add.
Production RAG: What Nobody Tells You After 6 Months
The RAG tutorials get you to a demo in an afternoon. They don't cover what happens six months into production: index staleness, retrieval quality decay, RAG-specific hallucination modes, cost at scale, and the chunking strategy that made sense at launch but doesn't fit real usage. Here's what we've learned.
See how your AI workflows actually score.
115 production readiness controls across 9 dimensions. Free for your first workflow. No credit card required.
Scan Your Repo — Free →