Virtscale / platform / efficiency

Compute only when it’s needed.
Ready before it’s asked for.

Idle hosts are a cost the orchestrator is responsible for, not a posture the company is responsible for. The autoscaler reads demand, projects ahead of it, and quiesces what isn’t earning its keep — without putting reactivation on the critical path of a request.

Read the activation path See the architecture

pool · prod-eu-a · general 02:14:08 CET

demand

38 rps

active / pool

14 / 64

idle ratio

78%

Principle

Efficiency is a property of the orchestrator.
Not a property of the press release.

Two failure modes sit on opposite ends of one slider. Overprovisioning — pay-for-the-spike, run-it-flat, never touch it — is operationally safe and economically dumb. Aggressive shutdown — tear it down the moment traffic dips, rebuild it when the next pulse arrives — is economically clean and operationally unacceptable, because rebuild time lands inside the user’s request.

The orchestrator that ships in substrate 7.2 lives between them. It looks at the contract envelope, the trailing 7-day demand pattern, and the queue depth at t−0, and it pre-warms what the projection asks for. Idle is drained; the next 90 seconds of expected demand is already accepting traffic.

Power draw goes down because the host is not running. Latency does not go up because the host that is running is the one the request was about to hit.

policypolicy gate · post-scale projection · ADR-0041

prediction window90 s horizon · 7-day seasonal baseline · per-org

quiesce floorper-pool · never below contract-reserved warm headroom

01 · How it works

Three mechanisms.
One outcome: capacity matches load.

all three run on every pool, every minute
all three are inspectable in the audit ledger

01 · predictive

Predictive scheduling

The orchestrator carries a 7-day seasonal baseline per workload pool. It pre-warms hosts ahead of the projected ramp — not on the ramp, ahead of it — so the request that triggers scaling lands on a host that’s already accepting traffic.

horizon90 s

basis7-day pattern · trailing 5-min slope

refresh15 s · per pool

02 · quiescence

Capacity quiescence

Hosts outside the predicted window are drained, snapshotted to RBD, and placed in C6 sleep. The hypervisor is up; the guests aren’t. Wake is a libvirt resume against a warm domain, not a fresh boot. ACPI wake is in milliseconds; the workload is rejoining its load balancer inside a second.

draingraceful · 45 s default

statelibvirt suspended · RBD-pinned

wakep50 380 ms

03 · reactive

Reactive headroom

The prediction is wrong sometimes. When it is, the policy gate has a contract-reserved warm pool sitting on standby — never cold, never billed twice — that absorbs the miss while the predictive layer recalibrates. The miss never reaches the request.

reserve5–20% of quota · per contract

tripqueue depth · p99 latency · CPU

settlep50 1.4 s · end-to-end

02 · The curve

Capacity tracks demand.
It does not flatten over it.

24 hours from a transactional workload pool
acme-tickets · 2026-01-28 · prod-eu-a

demand vs. provisioned capacity · 00:00 → 24:00 CET demand capacity

A · 02:00 — 05:00

Overnight quiesce

52 of 64 nodes drained and suspended. Warm reserve held at contract floor (5%).

B · 07:30 — 08:15

Predictive pre-warm

Capacity climbs 90 s ahead of the demand slope. Wakes batched in groups of 8.

C · 12:00 — 14:00

Peak · capacity > demand

Buffer held at +12%, well under the +20% signed cap. Zero 429 events on this day.

D · 19:00 — 22:00

Evening drain

Gradual quiesce as transactional traffic decays. Drain rate matched to slope, never aggressive.

03 · The activation path

Reactivation is a budget.
Every step is accounted for.

measured end-to-end on prod-eu-a
trailing 30d, 412k reactivation events

Quiesced → serving traffic cold-pool reactivation · p50

total ~1.4 s

Signalprometheus · queue depth

Scrape sample lands; queue-depth or p99-latency probe exceeds the per-pool trigger. Sample interval is fixed at 5 s, but slope detection runs on the trailing window so a single high sample doesn’t fire.

~140 ms

Policy gatecontract read · post-scale projection

The gate reads the org contract envelope, projects post-scale usage against quota+cap, and emits a scale decision. If the projection clears the cap, it returns 429 instead. Decision time is dominated by ledger fsync.

~45 ms

Wakelibvirt resume · ACPI

OneFlow calls libvirt against a suspended domain on a warm hypervisor. RBD is already attached; memory state is restored from snapshot. The guest kernel resumes mid-instruction. No PXE, no cloud-init, no image pull.

~380 ms

Healthprobe · readiness gate

Workload-defined readiness probe runs over the guest’s loopback. Typical web workloads pass on the first probe; long-warming workloads (JIT, cache fill) can declare a custom readiness window in the contract.

~310 ms

Announcecilium BGP · haproxy reload

Cilium announces the workload’s /32 to the ToR; HAProxy hot-reloads via the runtime API. Existing flows are preserved through the reload; new flows reach the woken node on the next request.

~525 ms

What this page is — and isn’t

Lower draw is the side effect.
Fast scaling is the product.

— Not on this page

Carbon-offset accounting
Climate-pledge marketing
Renewable-PPA disclosures
Tree counts

+ On this page

Idle hosts as a measurable orchestrator cost
Reactivation latency as a measured budget
Predictive scheduling with a published horizon
Quiesce floors held at the contract’s warm reserve

04 · Measured

Numbers from prod-eu-a.
Trailing thirty days.

numbers are read from the audit ledger and
published every release on engineering.html

Mean idle ratio

61%

Share of allocated cluster capacity in C6 sleep, averaged across 30 days, weighted by pool size.

Reactivation p50

1.4s

Median quiesced-to-serving time, measured end-to-end on 412k events. p99 is 2.3 s.

Pre-warm hit rate

94.2%

Share of scale-ups served by an already-warm host. The other 5.8% trip the reactive reserve.

Request latency tax

0ms

Predicted demand is pre-warmed; reactive misses absorb on the warm reserve. Reactivation is never on the request path.

Pool draw at idle

8% of peak

Hypervisor stays up; guests are suspended. Floor is set by hypervisor and storage daemon overhead.

429 events / month

Trailing 12 months across all production pools. Contract caps have not been hit in production.

Prediction horizon

90s

Time the orchestrator looks ahead before pre-warming. 7-day seasonal baseline, 5-min slope correction.

Substrate version

7.2

Quiescence + predictive scheduling shipped in 7.0 (Aug 2025). 7.2 added per-org horizon overrides.

Scale intelligently. Pay for the work, not the wait.

Send us the shape of the workload — vCPU, RAM, the daily curve you can describe. We’ll respond with a quota, a cap, and a quiesce profile. Operated end-to-end out of nl-ams-1.

Contact engineering Read the architecture

Compute only when it’s needed.Ready before it’s asked for.

Efficiency is a property of the orchestrator.Not a property of the press release.

Three mechanisms.One outcome: capacity matches load.