5 Hard Lessons from Shipping RAG to Production
What nobody tells you about retrieval-augmented generation until you're debugging at 2am — chunking strategies, embedding drift, and latency traps that will bite you.
Writing
Writing on AI engineering, distributed systems, cloud infrastructure, and the craft of building software that actually lasts.
4 articles
What nobody tells you about retrieval-augmented generation until you're debugging at 2am — chunking strategies, embedding drift, and latency traps that will bite you.
Beyond the basics — worker pools, fan-out/fan-in, semaphore-bounded goroutines, and the context cancellation patterns that keep distributed systems from turning into memory leak generators.
A practical walkthrough of the resource requests, node sizing, cluster autoscaler tuning, and spot instance strategies that slashed our monthly EKS spend from $14k to $5.6k.
The unglamorous reality of going independent — how I priced my first engagements, what I got wrong, and the systems I built to replace the stability of a full-time role.
No articles in this category yet.