Custom Autoscaling¶
Overview¶
Custom autoscaling enables you to implement programmatic control over your Resource scaling decisions when standard metric-based autoscaling doesn't meet your requirements. Unlike Omnistrate's built-in autoscaling that relies on predefined infrastructure or application metrics, custom autoscaling allows you to define completely custom scaling logic based on any business rules, external systems, or complex decision-making processes.
When you enable custom autoscaling for a Resource, Omnistrate provides a local sidecar API that allows your controller to query capacity information and trigger scaling operations. This gives you the flexibility to implement scaling logic in any programming language while leveraging Omnistrate's capacity management capabilities.
When to use custom autoscaling¶
Use Omnistrate's native metric-based autoscaling when:
- Scaling based on CPU, memory, or application metrics (queue length, request rate, etc.) meets your needs
- You want a fully managed, zero-code autoscaling solution
- Simple threshold-based scaling rules are sufficient for your workload
- You don't need custom business logic or complex decision-making
Use custom autoscaling when:
- You need to implement complex business logic that combines multiple conditions or rules
- You want predictive scaling based on ML models, time series forecasting, or historical patterns
- You require custom scheduling with complex rules (holidays, events, multi-stage rollouts)
- You need to integrate external systems into your scaling decisions
- You want to implement custom cooldown strategies or multi-resource coordination
- You need programmatic control over scaling with language-specific libraries or frameworks
- Standard metric-based thresholds don't capture your scaling requirements
- You need to scale down to zero replicas (custom autoscaling supports scaling to zero)
Configuring custom autoscaling¶
To enable custom autoscaling for a Resource, set policyType: custom in the autoscaling configuration under x-omnistrate-capabilities. Here is an example using compose spec:
services:
worker:
x-omnistrate-capabilities:
autoscaling:
policyType: custom
maxReplicas: 6
minReplicas: 1
Configuration parameters¶
- policyType: Set to
customto enable custom autoscaling - maxReplicas: Maximum number of replicas to scale up to
- minReplicas: Minimum number of replicas to maintain. Can be set to 0 to allow scaling down to zero
Sidecar API¶
When you deploy a SaaS Product with custom autoscaling enabled, Omnistrate automatically provides a local sidecar API. This API enables your custom controller to query capacity information and trigger scaling operations.
The local sidecar API is available at:
Note
The sidecar API is only available when running on Omnistrate. It is not available for local development outside of the Omnistrate platform.
API endpoints¶
All API endpoints are accessible at:
Where {resourceAlias} is the Resource key from your compose specification file.
Get current capacity¶
Retrieves the current capacity and status information for a Resource.
Endpoint: GET /resource/{resourceAlias}/capacity
Response:
{
"instanceId": "string",
"resourceId": "string",
"resourceAlias": "string",
"status": "ACTIVE|STARTING|PAUSED|FAILED|UNKNOWN",
"currentCapacity": 5,
"lastObservedTimestamp": "2025-11-05T12:34:56.789Z"
}
Status values:
ACTIVE- Resource is running and readySTARTING- Resource is starting upPAUSED- Resource is pausedFAILED- Resource has failedUNKNOWN- Status cannot be determined
Add capacity¶
Adds capacity units to a Resource.
Endpoint: POST /resource/{resourceAlias}/capacity/add
Request body:
Response:
Remove capacity¶
Removes capacity units from a Resource.
Endpoint: POST /resource/{resourceAlias}/capacity/remove
Request body:
Response:
Implementation best practices¶
Implement cooldown periods¶
Avoid rapid successive scaling operations by implementing a cooldown period between scaling actions. A recommended cooldown period is 5 minutes (300 seconds).
Wait for ACTIVE state¶
Always wait for a Resource to reach ACTIVE state before performing the next scaling operation:
- Check current status via GET capacity endpoint
- If status is
STARTING, poll until it becomesACTIVE - If status is
FAILED, handle the error appropriately - Only proceed with scaling when status is
ACTIVE
Implement step-based scaling¶
Scale gradually by adding or removing a fixed number of units per operation:
- Start with small steps (e.g., 1-2 units)
- Repeat operations until reaching target capacity
- Allow cooldown between steps
Respect scaling limits¶
Always validate that your scaling requests stay within the minReplicas and maxReplicas limits you defined in the Resource specification. Attempting to scale beyond these limits will fail. Your custom controller should:
- Check current capacity before calculating scaling targets
- Ensure target capacity is between
minReplicasandmaxReplicas - Handle edge cases when already at minimum or maximum capacity
Use retries with exponential backoff¶
Network requests can fail temporarily. Implement retry logic:
- Retry failed requests
- Use exponential backoff
- Set appropriate timeouts
Handle errors gracefully¶
- Check HTTP status codes (expect 200 for success)
- Parse error responses
- Log errors for debugging
Example implementation¶
For a complete working example of custom autoscaling, refer to the Custom Auto Scaling Example repository. This repository provides:
- An example Go implementation with cooldown management and state handling
- HTTP API for triggering scaling operations and checking status
- Docker containerization for easy deployment
- Complete implementation examples in multiple programming languages
- Common scaling patterns and best practices
Querying metrics with Prometheus endpoint¶
When you deploy a SaaS Product with Omnistrate, Prometheus endpoint is automatically provided. You can use the endpoint to query system and application metrics.
The metrics endpoint is available at:
Note
The Prometheus endpoint is only available when running on Omnistrate. It is not available for local development outside of the Omnistrate platform.
You can use Prometheus client libraries available for your programming language (such as the official Prometheus client libraries for Go, Java, Python, Ruby, and others) to query the endpoint.
Example metrics output¶
You can query the metrics endpoint using curl or any HTTP client:
The endpoint returns metrics in Prometheus text exposition format:
# HELP cpu_usage Current CPU usage
# TYPE cpu_usage gauge
cpu_usage{customer_visible="true",service_provider_visible="true"} 7.629704984760052
# HELP disk_ops_per_sec Current disk IOPS
# TYPE disk_ops_per_sec gauge
disk_ops_per_sec{customer_visible="true",disk="/app/storage",service_provider_visible="true",type="read"} 0
disk_ops_per_sec{customer_visible="true",disk="/app/storage",service_provider_visible="true",type="write"} 0
# HELP disk_throughput_bytes_per_sec Disk throughput in bytes per second
# TYPE disk_throughput_bytes_per_sec gauge
disk_throughput_bytes_per_sec{customer_visible="true",disk="/app/storage",service_provider_visible="true",type="read"} 0
disk_throughput_bytes_per_sec{customer_visible="true",disk="/app/storage",service_provider_visible="true",type="write"} 0
# HELP disk_usage_percent Current disk usage as a percentage of total disk space
# TYPE disk_usage_percent gauge
disk_usage_percent{customer_visible="true",path="/app/storage",service_provider_visible="true"} 0.006691028941233667
disk_usage_percent{customer_visible="true",path="/var/log/app",service_provider_visible="true"} 0.1497328281402588
# HELP load_avg Load average
# TYPE load_avg gauge
load_avg{customer_visible="true",period="15min",service_provider_visible="true"} 0
load_avg{customer_visible="true",period="1min",service_provider_visible="true"} 0
load_avg{customer_visible="true",period="5min",service_provider_visible="true"} 0
# HELP mem_total_bytes Total memory in bytes
# TYPE mem_total_bytes gauge
mem_total_bytes{customer_visible="true",service_provider_visible="true"} 4.294967296e+09
# HELP mem_usage_bytes Current memory usage in bytes
# TYPE mem_usage_bytes gauge
mem_usage_bytes{customer_visible="true",service_provider_visible="true"} 1.652563968e+09
# HELP mem_usage_percent Current memory usage as a percentage of total memory
# TYPE mem_usage_percent gauge
mem_usage_percent{customer_visible="true",service_provider_visible="true"} 38.47675323486328
# HELP system_uptime_seconds System uptime in seconds
# TYPE system_uptime_seconds gauge
system_uptime_seconds{customer_visible="true",service_provider_visible="true"} 6725
Available metrics¶
The metrics endpoint provides the following system metrics:
- cpu_usage: Current CPU usage percentage
- disk_ops_per_sec: Disk IOPS for read and write operations
- disk_throughput_bytes_per_sec: Disk throughput in bytes per second
- disk_usage_percent: Disk usage as a percentage of total disk space
- load_avg: System load average (1min, 5min, 15min periods)
- mem_total_bytes: Total available memory in bytes
- mem_usage_bytes: Current memory usage in bytes
- mem_usage_percent: Memory usage as a percentage of total memory
- system_uptime_seconds: System uptime in seconds
Using metrics for custom autoscaling¶
You can incorporate these metrics into your custom autoscaling logic by:
- Periodically querying the metrics endpoint from your custom controller
- Parsing the Prometheus text format to extract relevant metric values
- Implementing scaling decisions based on metric thresholds or patterns
- Combining multiple metrics for sophisticated scaling logic
For example, you might scale up when CPU usage exceeds 80% and memory usage exceeds 70% simultaneously, or implement predictive scaling based on historical metric patterns.