Custom Autoscaling¶

Overview¶

Custom autoscaling enables you to implement programmatic control over your Resource scaling decisions when standard metric-based autoscaling doesn't meet your requirements. Unlike Omnistrate's built-in autoscaling that relies on predefined infrastructure or application metrics, custom autoscaling allows you to define completely custom scaling logic based on any business rules, external systems, or complex decision-making processes.

When you enable custom autoscaling for a Resource, Omnistrate provides a local sidecar API that allows your controller to query capacity information and trigger scaling operations. This gives you the flexibility to implement scaling logic in any programming language while leveraging Omnistrate's capacity management capabilities.

When to use custom autoscaling¶

Use Omnistrate's native metric-based autoscaling when:

Scaling based on CPU, memory, or application metrics (queue length, request rate, etc.) meets your needs
You want a fully managed, zero-code autoscaling solution
Simple threshold-based scaling rules are sufficient for your workload
You don't need custom business logic or complex decision-making

Use custom autoscaling when:

You need to implement complex business logic that combines multiple conditions or rules
You want predictive scaling based on ML models, time series forecasting, or historical patterns
You require custom scheduling with complex rules (holidays, events, multi-stage rollouts)
You need to integrate external systems into your scaling decisions
You want to implement custom cooldown strategies or multi-resource coordination
You need programmatic control over scaling with language-specific libraries or frameworks
Standard metric-based thresholds don't capture your scaling requirements
You need to scale down to zero replicas (custom autoscaling supports scaling to zero)

Configuring custom autoscaling¶

To enable custom autoscaling for a Resource, set policyType: custom in the autoscaling configuration under x-omnistrate-capabilities. Here is an example using compose spec:

services:
  worker:
    x-omnistrate-capabilities:
      autoscaling:
        policyType: custom
        maxReplicas: 6
        minReplicas: 1

Configuration parameters¶

policyType: Set to custom to enable custom autoscaling
maxReplicas: Maximum number of replicas to scale up to
minReplicas: Minimum number of replicas to maintain. Can be set to 0 to allow scaling down to zero

Sidecar API¶

When you deploy a SaaS Product with custom autoscaling enabled, Omnistrate automatically provides a local sidecar API at http://127.0.0.1:49750. This API enables your custom controller to query capacity information and trigger scaling operations.

Note

The sidecar API is only available when running on Omnistrate. It is not available for local development outside of the Omnistrate platform.

API endpoints¶

All API endpoints are accessible at:

http://127.0.0.1:49750/resource/{resourceAlias}

Where {resourceAlias} is the Resource key from your compose specification file.

Get current capacity¶

Retrieves the current capacity and status information for a Resource.

Endpoint: GET /resource/{resourceAlias}/capacity

Response:

{
  "instanceId": "string",
  "resourceId": "string",
  "resourceAlias": "string",
  "status": "ACTIVE|STARTING|PAUSED|FAILED|UNKNOWN",
  "currentCapacity": 5,
  "lastObservedTimestamp": "2025-11-05T12:34:56.789Z"
}

Status values:

ACTIVE - Resource is running and ready
STARTING - Resource is starting up
PAUSED - Resource is paused
FAILED - Resource has failed
UNKNOWN - Status cannot be determined

Add capacity¶

Adds capacity units to a Resource.

Endpoint: POST /resource/{resourceAlias}/capacity/add

Request body:

{
  "capacityToBeAdded": 2
}

Response:

{
  "instanceId": "string",
  "resourceId": "string",
  "resourceAlias": "string"
}

Remove capacity¶

Removes capacity units from a Resource.

Endpoint: POST /resource/{resourceAlias}/capacity/remove

Request body:

{
  "capacityToBeRemoved": 1
}

Response:

{
  "instanceId": "string",
  "resourceId": "string",
  "resourceAlias": "string"
}

Implementation best practices¶

Implement cooldown periods¶

Avoid rapid successive scaling operations by implementing a cooldown period between scaling actions. A recommended cooldown period is 5 minutes (300 seconds).

Last Scale Action → Wait 5 minutes → Next Scale Action

Wait for ACTIVE state¶

Always wait for a Resource to reach ACTIVE state before performing the next scaling operation:

Check current status via GET capacity endpoint
If status is STARTING, poll until it becomes ACTIVE
If status is FAILED, handle the error appropriately
Only proceed with scaling when status is ACTIVE

Implement step-based scaling¶

Scale gradually by adding or removing a fixed number of units per operation:

Start with small steps (e.g., 1-2 units)
Repeat operations until reaching target capacity
Allow cooldown between steps

Respect scaling limits¶

Always validate that your scaling requests stay within the minReplicas and maxReplicas limits you defined in the Resource specification. Attempting to scale beyond these limits will fail. Your custom controller should:

Check current capacity before calculating scaling targets
Ensure target capacity is between minReplicas and maxReplicas
Handle edge cases when already at minimum or maximum capacity

Use retries with exponential backoff¶

Network requests can fail temporarily. Implement retry logic:

Retry failed requests
Use exponential backoff
Set appropriate timeouts

Handle errors gracefully¶

Check HTTP status codes (expect 200 for success)
Parse error responses
Log errors for debugging

Example implementation¶

For a complete working example of custom autoscaling, refer to the Custom Auto Scaling Example repository. This repository provides:

An example Go implementation with cooldown management and state handling
HTTP API for triggering scaling operations and checking status
Docker containerization for easy deployment
Complete implementation examples in multiple programming languages
Common scaling patterns and best practices