virtscale
Virtscale / platform / efficiency

Compute only when it’s needed.
Ready before it’s asked for.

Idle hosts are a cost the orchestrator is responsible for, not a posture the company is responsible for. The autoscaler reads demand, projects ahead of it, and quiesces what isn’t earning its keep — without putting reactivation on the critical path of a request.

Principle

Efficiency is a property of the orchestrator.
Not a property of the press release.

Two failure modes sit on opposite ends of one slider. Overprovisioning — pay-for-the-spike, run-it-flat, never touch it — is operationally safe and economically dumb. Aggressive shutdown — tear it down the moment traffic dips, rebuild it when the next pulse arrives — is economically clean and operationally unacceptable, because rebuild time lands inside the user’s request.

The orchestrator that ships in substrate 7.2 lives between them. It looks at the contract envelope, the trailing 7-day demand pattern, and the queue depth at t−0, and it pre-warms what the projection asks for. Idle is drained; the next 90 seconds of expected demand is already accepting traffic.

Power draw goes down because the host is not running. Latency does not go up because the host that is running is the one the request was about to hit.

policypolicy gate · post-scale projection · ADR-0041
prediction window90 s horizon · 7-day seasonal baseline · per-org
quiesce floorper-pool · never below contract-reserved warm headroom
01 · How it works

Three mechanisms.
One outcome: capacity matches load.

all three run on every pool, every minute
all three are inspectable in the audit ledger
01 · predictive

Predictive scheduling

The orchestrator carries a 7-day seasonal baseline per workload pool. It pre-warms hosts ahead of the projected ramp — not on the ramp, ahead of it — so the request that triggers scaling lands on a host that’s already accepting traffic.

horizon90 s
basis7-day pattern · trailing 5-min slope
refresh15 s · per pool
02 · quiescence

Capacity quiescence

Hosts outside the predicted window are drained, snapshotted to RBD, and placed in C6 sleep. The hypervisor is up; the guests aren’t. Wake is a libvirt resume against a warm domain, not a fresh boot. ACPI wake is in milliseconds; the workload is rejoining its load balancer inside a second.

draingraceful · 45 s default
statelibvirt suspended · RBD-pinned
wakep50 380 ms
03 · reactive

Reactive headroom

The prediction is wrong sometimes. When it is, the policy gate has a contract-reserved warm pool sitting on standby — never cold, never billed twice — that absorbs the miss while the predictive layer recalibrates. The miss never reaches the request.

reserve5–20% of quota · per contract
tripqueue depth · p99 latency · CPU
settlep50 1.4 s · end-to-end
02 · The curve

Capacity tracks demand.
It does not flatten over it.

24 hours from a transactional workload pool
acme-tickets · 2026-01-28 · prod-eu-a
demand vs. provisioned capacity · 00:00 → 24:00 CET demand capacity
100% 75% 50% 25% 0% 00 04 08 12 16 20 24 A · OVERNIGHT QUIESCE B · PREDICTIVE PRE-WARM C · PEAK · CAPACITY > DEMAND D · EVENING DRAIN
A · 02:00 — 05:00

Overnight quiesce

52 of 64 nodes drained and suspended. Warm reserve held at contract floor (5%).

B · 07:30 — 08:15

Predictive pre-warm

Capacity climbs 90 s ahead of the demand slope. Wakes batched in groups of 8.

C · 12:00 — 14:00

Peak · capacity > demand

Buffer held at +12%, well under the +20% signed cap. Zero 429 events on this day.

D · 19:00 — 22:00

Evening drain

Gradual quiesce as transactional traffic decays. Drain rate matched to slope, never aggressive.

03 · The activation path

Reactivation is a budget.
Every step is accounted for.

measured end-to-end on prod-eu-a
trailing 30d, 412k reactivation events
Quiesced → serving traffic cold-pool reactivation · p50
total ~1.4 s
L1
Signalprometheus · queue depth
Scrape sample lands; queue-depth or p99-latency probe exceeds the per-pool trigger. Sample interval is fixed at 5 s, but slope detection runs on the trailing window so a single high sample doesn’t fire.
~140 ms
L2
Policy gatecontract read · post-scale projection
The gate reads the org contract envelope, projects post-scale usage against quota+cap, and emits a scale decision. If the projection clears the cap, it returns 429 instead. Decision time is dominated by ledger fsync.
~45 ms
L3
Wakelibvirt resume · ACPI
OneFlow calls libvirt against a suspended domain on a warm hypervisor. RBD is already attached; memory state is restored from snapshot. The guest kernel resumes mid-instruction. No PXE, no cloud-init, no image pull.
~380 ms
L4
Healthprobe · readiness gate
Workload-defined readiness probe runs over the guest’s loopback. Typical web workloads pass on the first probe; long-warming workloads (JIT, cache fill) can declare a custom readiness window in the contract.
~310 ms
L5
Announcecilium BGP · haproxy reload
Cilium announces the workload’s /32 to the ToR; HAProxy hot-reloads via the runtime API. Existing flows are preserved through the reload; new flows reach the woken node on the next request.
~525 ms
What this page is — and isn’t

Lower draw is the side effect.
Fast scaling is the product.

— Not on this page

  • Carbon-offset accounting
  • Climate-pledge marketing
  • Renewable-PPA disclosures
  • Tree counts

+ On this page

  • Idle hosts as a measurable orchestrator cost
  • Reactivation latency as a measured budget
  • Predictive scheduling with a published horizon
  • Quiesce floors held at the contract’s warm reserve
04 · Measured

Numbers from prod-eu-a.
Trailing thirty days.

numbers are read from the audit ledger and
published every release on engineering.html
Mean idle ratio
61%

Share of allocated cluster capacity in C6 sleep, averaged across 30 days, weighted by pool size.

Reactivation p50
1.4s

Median quiesced-to-serving time, measured end-to-end on 412k events. p99 is 2.3 s.

Pre-warm hit rate
94.2%

Share of scale-ups served by an already-warm host. The other 5.8% trip the reactive reserve.

Request latency tax
0ms

Predicted demand is pre-warmed; reactive misses absorb on the warm reserve. Reactivation is never on the request path.

Pool draw at idle
8% of peak

Hypervisor stays up; guests are suspended. Floor is set by hypervisor and storage daemon overhead.

429 events / month
0

Trailing 12 months across all production pools. Contract caps have not been hit in production.

Prediction horizon
90s

Time the orchestrator looks ahead before pre-warming. 7-day seasonal baseline, 5-min slope correction.

Substrate version
7.2

Quiescence + predictive scheduling shipped in 7.0 (Aug 2025). 7.2 added per-org horizon overrides.

Scale intelligently. Pay for the work, not the wait.

Send us the shape of the workload — vCPU, RAM, the daily curve you can describe. We’ll respond with a quota, a cap, and a quiesce profile. Operated end-to-end out of nl-ams-1.