Skip to content

Custom Autoscaling

Overview

Custom autoscaling enables you to implement programmatic control over your Resource scaling decisions when standard metric-based autoscaling doesn't meet your requirements. Unlike Omnistrate's built-in autoscaling that relies on predefined infrastructure or application metrics, custom autoscaling allows you to define completely custom scaling logic based on any business rules, external systems, or complex decision-making processes.

When you enable custom autoscaling for a Resource, Omnistrate provides a local sidecar API that allows your controller to query capacity information and trigger scaling operations. This gives you the flexibility to implement scaling logic in any programming language while leveraging Omnistrate's capacity management capabilities.

When to use custom autoscaling

Use Omnistrate's native metric-based autoscaling when:

  • Scaling based on CPU, memory, or application metrics (queue length, request rate, etc.) meets your needs
  • You want a fully managed, zero-code autoscaling solution
  • Simple threshold-based scaling rules are sufficient for your workload
  • You don't need custom business logic or complex decision-making

Use custom autoscaling when:

  • You need to implement complex business logic that combines multiple conditions or rules
  • You want predictive scaling based on ML models, time series forecasting, or historical patterns
  • You require custom scheduling with complex rules (holidays, events, multi-stage rollouts)
  • You need to integrate external systems into your scaling decisions
  • You want to implement custom cooldown strategies or multi-resource coordination
  • You need programmatic control over scaling with language-specific libraries or frameworks
  • Standard metric-based thresholds don't capture your scaling requirements
  • You need to scale down to zero replicas (custom autoscaling supports scaling to zero)

Configuring custom autoscaling

To enable custom autoscaling for a Resource, set policyType: custom in the autoscaling configuration under x-omnistrate-capabilities. Here is an example using compose spec:

services:
  worker:
    x-omnistrate-capabilities:
      autoscaling:
        policyType: custom
        maxReplicas: 6
        minReplicas: 1

Configuration parameters

  • policyType: Set to custom to enable custom autoscaling
  • maxReplicas: Maximum number of replicas to scale up to
  • minReplicas: Minimum number of replicas to maintain. Can be set to 0 to allow scaling down to zero

Sidecar API

When you deploy a SaaS Product with custom autoscaling enabled, Omnistrate automatically provides a local sidecar API at http://127.0.0.1:49750. This API enables your custom controller to query capacity information and trigger scaling operations.

Note

The sidecar API is only available when running on Omnistrate. It is not available for local development outside of the Omnistrate platform.

API endpoints

All API endpoints are accessible at:

http://127.0.0.1:49750/resource/{resourceAlias}

Where {resourceAlias} is the Resource key from your compose specification file.

Get current capacity

Retrieves the current capacity and status information for a Resource.

Endpoint: GET /resource/{resourceAlias}/capacity

Response:

{
  "instanceId": "string",
  "resourceId": "string",
  "resourceAlias": "string",
  "status": "ACTIVE|STARTING|PAUSED|FAILED|UNKNOWN",
  "currentCapacity": 5,
  "lastObservedTimestamp": "2025-11-05T12:34:56.789Z"
}

Status values:

  • ACTIVE - Resource is running and ready
  • STARTING - Resource is starting up
  • PAUSED - Resource is paused
  • FAILED - Resource has failed
  • UNKNOWN - Status cannot be determined

Add capacity

Adds capacity units to a Resource.

Endpoint: POST /resource/{resourceAlias}/capacity/add

Request body:

{
  "capacityToBeAdded": 2
}

Response:

{
  "instanceId": "string",
  "resourceId": "string",
  "resourceAlias": "string"
}

Remove capacity

Removes capacity units from a Resource.

Endpoint: POST /resource/{resourceAlias}/capacity/remove

Request body:

{
  "capacityToBeRemoved": 1
}

Response:

{
  "instanceId": "string",
  "resourceId": "string",
  "resourceAlias": "string"
}

Implementation best practices

Implement cooldown periods

Avoid rapid successive scaling operations by implementing a cooldown period between scaling actions. A recommended cooldown period is 5 minutes (300 seconds).

Last Scale Action → Wait 5 minutes → Next Scale Action

Wait for ACTIVE state

Always wait for a Resource to reach ACTIVE state before performing the next scaling operation:

  1. Check current status via GET capacity endpoint
  2. If status is STARTING, poll until it becomes ACTIVE
  3. If status is FAILED, handle the error appropriately
  4. Only proceed with scaling when status is ACTIVE

Implement step-based scaling

Scale gradually by adding or removing a fixed number of units per operation:

  • Start with small steps (e.g., 1-2 units)
  • Repeat operations until reaching target capacity
  • Allow cooldown between steps

Respect scaling limits

Always validate that your scaling requests stay within the minReplicas and maxReplicas limits you defined in the Resource specification. Attempting to scale beyond these limits will fail. Your custom controller should:

  • Check current capacity before calculating scaling targets
  • Ensure target capacity is between minReplicas and maxReplicas
  • Handle edge cases when already at minimum or maximum capacity

Use retries with exponential backoff

Network requests can fail temporarily. Implement retry logic:

  • Retry failed requests
  • Use exponential backoff
  • Set appropriate timeouts

Handle errors gracefully

  • Check HTTP status codes (expect 200 for success)
  • Parse error responses
  • Log errors for debugging

Example implementation

For a complete working example of custom autoscaling, refer to the Custom Auto Scaling Example repository. This repository provides:

  • An example Go implementation with cooldown management and state handling
  • HTTP API for triggering scaling operations and checking status
  • Docker containerization for easy deployment
  • Complete implementation examples in multiple programming languages
  • Common scaling patterns and best practices