Autoscaling
Horizontally scale a service's replica count up and down based on observed CPU, memory, and request rate. The agent samples every 30 seconds and applies changes through the existing deploy pipeline so traffic isn't dropped.
Enabling
Open the service's Settings → Scaling tab and switch to Autoscale. Set min and max replica counts (max ≥ 2 enables the scaler; min/max equal to 1 disables it). Tune the per-signal scale-up thresholds; scale-down uses 50% of each up-threshold.
Signals
Three signals drive scaling decisions. Any one breaching its up-threshold triggers scale-up; all three must sit below 50% of their up-threshold to trigger scale-down.
# Defaults — every value is per-service tunable in the dashboard.
CPU avg across replicas > scaleCpuThreshold (default 80%)
Memory avg RSS / limit > scaleMemoryThreshold (default 80%)
Traefik RPS into this svc > scaleRequestsPerSecThreshold (off by default)
# Scale-down requires all three:
CPU avg < 0.5 × scaleCpuThreshold (e.g. < 40%)
Memory avg < 0.5 × scaleMemoryThreshold (e.g. < 40%)
RPS < 0.5 × scaleRequestsPerSecThreshold (or RPS threshold unset)RPS is scraped from Traefik's own Prometheus metrics endpoint — it counts every HTTP request hitting the service's router, regardless of upstream replica. Set a threshold value to opt the service in; leave it unset and the autoscaler falls back to CPU + memory only.
Anti-flap (sustained-reading gate)
The agent doesn't act on a single spike. A scale-up requires 4 consecutive above-threshold readings (≈ 2 minutes at the 30-second tick); scale-down requires 10 consecutive below-threshold readings (≈ 5 minutes). A reading on the opposite side resets the counter, so a noisy burst doesn't accumulate toward an unwanted scale event.
On top of that, only one scale action runs at a time per agent — the in-flight action holds a mutex so a second tick can't double-scale during a 30-second graceful drain.
Hard bounds
- The scaler never exceeds maxInstances or drops below minInstances, even if signals say so.
- A successful scale event resets the relevant counter to 0, so the next decision starts fresh.
- Scale events emit a structured
scale_eventon the activity feed with the triggering metric value.
When to Use Autoscale vs Fixed
- Autoscale when your traffic is spiky (marketing campaigns, geo-skewed peak hours, queue-backed async workers).
- Fixed when your workload is steady and predictable, or when you want hard budget control on the resource line item.
Billing
Replicas are billed per-second at the service-tier rate. Scaling from 1 → 3 replicas triples the rate while the extra replicas are running and drops back the moment they're released. The dashboard's Billing tab shows the per-resource hourly breakdown.
Next: Health Checks · Networking