virtscale
Virtscale / architecture

From signal to ledger line.
Five layers, no hidden hops.

This is the autoscale loop, drawn from prod-eu-a. Every path on the diagram is a real wire in production. Versions are pinned to substrate 7.2 as of 2026-01-30; the bill of materials is at the bottom of the page and matches what's running right now.

substrate7.2 · 2026-01-30
layers5 · signal → ledger
p50 loop52 s end-to-end
p99 loop71 s end-to-end
01 · Diagram

What the autoscale loop touches.
Five layers, top to bottom.

drawn from prod-eu-a
2026-05
virtscale arch · substrate 7.2
data path refusal path audit / ledger
L1signal
prometheusper-org · v2.51 queue depthruntime p99 latencyedge probe scheduled triggerscron-based
L2policy
policy gatereads contract quotavCPU · RAM · disk overage cap+5/+10/+20% projectionpost-scale usage refusal429 · email
L3orchestration
opennebula7.2 · core oneflowvm-pool · service schedulernuma-aware warm templatesKVM image
L4data plane
cilium1.19 · CNI · BGP haproxy2.8 LTS · runtime-API ceph rgwS3 · IAM-style ceph rbd3× replication
L5audit
signed evented25519 per-org ledgerappend-only postmortem10 business days cold storageRGW · IA tier
02 · Loop

One scale event, step by step.
evt-7f3a91 · prod-eu-a · 2026-04-21 03:14 CET.

52 s end-to-end
in-envelope · settled clean
vm-pool/web · oneflow · org acme-tickets all timestamps UTC offset +01:00 (CET)
03:14:00.122
SIGNAL · L1

prometheus alertmanager fires autoscale.cpu.hi on vm-pool/web. CPU 72% across 8 vCPU domains, queue depth 240 at HAProxy. Alert webhook hits the policy gate on prod-eu-a.

03:14:12.418
POLICY · L2

Gate reads the org contract: quota 100 vCPU · 480 GiB, cap +10%, signed 2025-11-02. Projection of post-scale usage: 9/110 vCPU committed = 82% of envelope. Within cap.

03:14:14.001
SCALE · L3

Gate calls oneflow.scale("web", +1). Scheduler picks r3.05.nd-04a2c1 in zone-b (least-loaded NUMA node). OpenNebula provisions a KVM domain from the warm template; bootdisk on Ceph RBD, ephemeral local NVMe attached.

03:14:42.117
HEALTH · L4

Domain passes liveness on first probe (28 seconds post-spawn). Cilium adds the pod IP to the BGP-announced backend set for VIP 10.10.4.12/32. HAProxy 2.8 reloads via runtime API.

03:14:44.880
OK · L4

HAProxy reload complete. Zero dropped connections. Traffic balanced across 9 backends. CPU on the pool drops to 58% over the next 30 s.

03:14:52.061
SETTLED · L5

Event evt-7f3a91 signed (ed25519, key policy-gate-2026-q1), appended to org acme-tickets's ledger. Audit line includes contract version, projection numbers, scheduler decision, and zone placement. End-to-end: 51.94 s.

03:18:02.700
REFUSED · L2 (next event, hypothetical)

If a subsequent trigger had asked for an 11th vCPU, the gate would have projected 11/110 = 110.1% of envelope. Provisioning call refused with HTTP 429, response includes contract violation reason. Mail dispatched to org owner within 30 s. Ledger appended with refusal line.

03 · Bill of materials

What's pinned in prod, right now.
Substrate 7.2, by version.

last rolled 2026-01-30
all hosts on Linux 6.6 LTS
ComponentVersionRolePin / source
OpenNebula7.2.0core orchestrator, OneFlow servicesapt · vendor repo
CephReef 18.2.4RGW + RBD + CephFScephadm
Cilium1.19.0CNI · BGP control planehelm chart, pinned digest
HAProxy2.8.12 LTSingress · runtime APIvendor repo
Kubernetes1.30.4managed clusterskubeadm · in-house image
Linux kernel6.6 LTShost kernel · all racksDebian backport, frozen
KVM / libvirt9.0 / 10.4VM substrateDebian stable
etcd3.5.13k8s control plane backing storein-house image
Prometheus2.51.1per-org metricshelm · in-house chart
Alertmanager0.27.0per-org routinghelm · in-house chart