Postmortems, ADRs, release notes, field reports. Written by the people on call, published within ten business days of the event, indexed below. No marketing reviews, no email walls.
A single OSD's IO thread stalled for a minute and a half. Customer reads didn't fail — they got slow. The probe we'd built to catch slow reads only checked the median, not the tail. Here's the trace, the dashboard we now keep open, and the change we made to the probe contract.
Read postmortemThe case for not running a managed observability tier we'd then have to charge a markup on. What we run for ourselves, what we hand to customers, and why the boundary is where it is.
Read ADRWhat 2,184 capped scale events looked like on our P&L. Why refusing capacity is cheaper to operate than inventing it, and how customers actually responded when they saw their first 429.
Read brief