Custom Autoscaling¶
Overview¶
Custom autoscaling enables you to implement programmatic control over your Resource scaling decisions when standard metric-based autoscaling doesn't meet your requirements. Unlike Omnistrate's built-in autoscaling that relies on predefined infrastructure or application metrics, custom autoscaling allows you to define completely custom scaling logic based on any business rules, external systems, or complex decision-making processes.
When you enable custom autoscaling for a Resource, Omnistrate provides a local sidecar API that allows your controller to query capacity information and trigger scaling operations. This gives you the flexibility to implement scaling logic in any programming language while leveraging Omnistrate's capacity management capabilities.
When to use custom autoscaling¶
Use Omnistrate's native metric-based autoscaling when:
- Scaling based on CPU, memory, or application metrics (queue length, request rate, etc.) meets your needs
- You want a fully managed, zero-code autoscaling solution
- Simple threshold-based scaling rules are sufficient for your workload
- You don't need custom business logic or complex decision-making
Use custom autoscaling when:
- You need to implement complex business logic that combines multiple conditions or rules
- You want predictive scaling based on ML models, time series forecasting, or historical patterns
- You require custom scheduling with complex rules (holidays, events, multi-stage rollouts)
- You need to integrate external systems into your scaling decisions
- You want to implement custom cooldown strategies or multi-resource coordination
- You need programmatic control over scaling with language-specific libraries or frameworks
- Standard metric-based thresholds don't capture your scaling requirements
- You need to scale down to zero replicas (custom autoscaling supports scaling to zero)
Configuring custom autoscaling¶
To enable custom autoscaling for a Resource, set policyType: custom in the autoscaling configuration under x-omnistrate-capabilities. Here is an example using compose spec:
services:
worker:
x-omnistrate-capabilities:
autoscaling:
policyType: custom
maxReplicas: 6
minReplicas: 1
Configuration parameters¶
- policyType: Set to
customto enable custom autoscaling - maxReplicas: Maximum number of replicas to scale up to
- minReplicas: Minimum number of replicas to maintain. Can be set to 0 to allow scaling down to zero
Sidecar API¶
When you deploy a SaaS Product with custom autoscaling enabled, Omnistrate automatically provides a local sidecar API at http://127.0.0.1:49750. This API enables your custom controller to query capacity information and trigger scaling operations.
Note
The sidecar API is only available when running on Omnistrate. It is not available for local development outside of the Omnistrate platform.
API endpoints¶
All API endpoints are accessible at:
Where {resourceAlias} is the Resource key from your compose specification file.
Get current capacity¶
Retrieves the current capacity and status information for a Resource.
Endpoint: GET /resource/{resourceAlias}/capacity
Response:
{
"instanceId": "string",
"resourceId": "string",
"resourceAlias": "string",
"status": "ACTIVE|STARTING|PAUSED|FAILED|UNKNOWN",
"currentCapacity": 5,
"lastObservedTimestamp": "2025-11-05T12:34:56.789Z"
}
Status values:
ACTIVE- Resource is running and readySTARTING- Resource is starting upPAUSED- Resource is pausedFAILED- Resource has failedUNKNOWN- Status cannot be determined
Add capacity¶
Adds capacity units to a Resource.
Endpoint: POST /resource/{resourceAlias}/capacity/add
Request body:
Response:
Remove capacity¶
Removes capacity units from a Resource.
Endpoint: POST /resource/{resourceAlias}/capacity/remove
Request body:
Response:
Implementation best practices¶
Implement cooldown periods¶
Avoid rapid successive scaling operations by implementing a cooldown period between scaling actions. A recommended cooldown period is 5 minutes (300 seconds).
Wait for ACTIVE state¶
Always wait for a Resource to reach ACTIVE state before performing the next scaling operation:
- Check current status via GET capacity endpoint
- If status is
STARTING, poll until it becomesACTIVE - If status is
FAILED, handle the error appropriately - Only proceed with scaling when status is
ACTIVE
Implement step-based scaling¶
Scale gradually by adding or removing a fixed number of units per operation:
- Start with small steps (e.g., 1-2 units)
- Repeat operations until reaching target capacity
- Allow cooldown between steps
Respect scaling limits¶
Always validate that your scaling requests stay within the minReplicas and maxReplicas limits you defined in the Resource specification. Attempting to scale beyond these limits will fail. Your custom controller should:
- Check current capacity before calculating scaling targets
- Ensure target capacity is between
minReplicasandmaxReplicas - Handle edge cases when already at minimum or maximum capacity
Use retries with exponential backoff¶
Network requests can fail temporarily. Implement retry logic:
- Retry failed requests
- Use exponential backoff
- Set appropriate timeouts
Handle errors gracefully¶
- Check HTTP status codes (expect 200 for success)
- Parse error responses
- Log errors for debugging
Example implementation¶
For a complete working example of custom autoscaling, refer to the Custom Auto Scaling Example repository. This repository provides:
- An example Go implementation with cooldown management and state handling
- HTTP API for triggering scaling operations and checking status
- Docker containerization for easy deployment
- Complete implementation examples in multiple programming languages
- Common scaling patterns and best practices