Idle hosts are a cost the orchestrator is responsible for, not a posture the company is responsible for. The autoscaler reads demand, projects ahead of it, and quiesces what isn’t earning its keep — without putting reactivation on the critical path of a request.
Two failure modes sit on opposite ends of one slider. Overprovisioning — pay-for-the-spike, run-it-flat, never touch it — is operationally safe and economically dumb. Aggressive shutdown — tear it down the moment traffic dips, rebuild it when the next pulse arrives — is economically clean and operationally unacceptable, because rebuild time lands inside the user’s request.
The orchestrator that ships in substrate 7.2 lives between them. It looks at the contract envelope, the trailing 7-day demand pattern, and the queue depth at t−0, and it pre-warms what the projection asks for. Idle is drained; the next 90 seconds of expected demand is already accepting traffic.
Power draw goes down because the host is not running. Latency does not go up because the host that is running is the one the request was about to hit.
The orchestrator carries a 7-day seasonal baseline per workload pool. It pre-warms hosts ahead of the projected ramp — not on the ramp, ahead of it — so the request that triggers scaling lands on a host that’s already accepting traffic.
Hosts outside the predicted window are drained, snapshotted to RBD, and placed in C6 sleep. The hypervisor is up; the guests aren’t. Wake is a libvirt resume against a warm domain, not a fresh boot. ACPI wake is in milliseconds; the workload is rejoining its load balancer inside a second.
The prediction is wrong sometimes. When it is, the policy gate has a contract-reserved warm pool sitting on standby — never cold, never billed twice — that absorbs the miss while the predictive layer recalibrates. The miss never reaches the request.
52 of 64 nodes drained and suspended. Warm reserve held at contract floor (5%).
Capacity climbs 90 s ahead of the demand slope. Wakes batched in groups of 8.
Buffer held at +12%, well under the +20% signed cap. Zero 429 events on this day.
Gradual quiesce as transactional traffic decays. Drain rate matched to slope, never aggressive.
Share of allocated cluster capacity in C6 sleep, averaged across 30 days, weighted by pool size.
Median quiesced-to-serving time, measured end-to-end on 412k events. p99 is 2.3 s.
Share of scale-ups served by an already-warm host. The other 5.8% trip the reactive reserve.
Predicted demand is pre-warmed; reactive misses absorb on the warm reserve. Reactivation is never on the request path.
Hypervisor stays up; guests are suspended. Floor is set by hypervisor and storage daemon overhead.
Trailing 12 months across all production pools. Contract caps have not been hit in production.
Time the orchestrator looks ahead before pre-warming. 7-day seasonal baseline, 5-min slope correction.
Quiescence + predictive scheduling shipped in 7.0 (Aug 2025). 7.2 added per-org horizon overrides.
Send us the shape of the workload — vCPU, RAM, the daily curve you can describe. We’ll respond with a quota, a cap, and a quiesce profile. Operated end-to-end out of nl-ams-1.