Skip to content

Custom Autoscaling

Overview

Custom autoscaling enables you to implement programmatic control over your Resource scaling decisions when standard metric-based autoscaling doesn't meet your requirements. Unlike Omnistrate's built-in autoscaling that relies on predefined infrastructure or application metrics, custom autoscaling allows you to define completely custom scaling logic based on any business rules, external systems, or complex decision-making processes.

When you enable custom autoscaling for a Resource, Omnistrate provides a local sidecar API that allows your controller to query capacity information and trigger scaling operations. This gives you the flexibility to implement scaling logic in any programming language while leveraging Omnistrate's capacity management capabilities.

When to use custom autoscaling

Use Omnistrate's native metric-based autoscaling when:

  • Scaling based on CPU, memory, or application metrics (queue length, request rate, etc.) meets your needs
  • You want a fully managed, zero-code autoscaling solution
  • Simple threshold-based scaling rules are sufficient for your workload
  • You don't need custom business logic or complex decision-making

Use custom autoscaling when:

  • You need to implement complex business logic that combines multiple conditions or rules
  • You want predictive scaling based on ML models, time series forecasting, or historical patterns
  • You require custom scheduling with complex rules (holidays, events, multi-stage rollouts)
  • You need to integrate external systems into your scaling decisions
  • You want to implement custom cooldown strategies or multi-resource coordination
  • You need programmatic control over scaling with language-specific libraries or frameworks
  • Standard metric-based thresholds don't capture your scaling requirements
  • You need to scale down to zero replicas (custom autoscaling supports scaling to zero)

Configuring custom autoscaling

To enable custom autoscaling for a Resource, set policyType: custom in the autoscaling configuration under x-omnistrate-capabilities. Here is an example using compose spec:

services:
  worker:
    x-omnistrate-capabilities:
      autoscaling:
        policyType: custom
        maxReplicas: 6
        minReplicas: 1

Configuration parameters

  • policyType: Set to custom to enable custom autoscaling
  • maxReplicas: Maximum number of replicas to scale up to
  • minReplicas: Minimum number of replicas to maintain. Can be set to 0 to allow scaling down to zero

Sidecar API

When you deploy a SaaS Product with custom autoscaling enabled, Omnistrate automatically provides a local sidecar API. This API enables your custom controller to query capacity information and trigger scaling operations.

The local sidecar API is available at:

http://127.0.0.1:49750

Note

The sidecar API is only available when running on Omnistrate. It is not available for local development outside of the Omnistrate platform.

API endpoints

All API endpoints are accessible at:

http://127.0.0.1:49750/resource/{resourceAlias}

Where {resourceAlias} is the Resource key from your compose specification file.

Get current capacity

Retrieves the current capacity and status information for a Resource.

Endpoint: GET /resource/{resourceAlias}/capacity

Response:

{
  "instanceId": "string",
  "resourceId": "string",
  "resourceAlias": "string",
  "status": "ACTIVE|STARTING|PAUSED|FAILED|UNKNOWN",
  "currentCapacity": 5,
  "lastObservedTimestamp": "2025-11-05T12:34:56.789Z"
}

Status values:

  • ACTIVE - Resource is running and ready
  • STARTING - Resource is starting up
  • PAUSED - Resource is paused
  • FAILED - Resource has failed
  • UNKNOWN - Status cannot be determined

Add capacity

Adds capacity units to a Resource.

Endpoint: POST /resource/{resourceAlias}/capacity/add

Request body:

{
  "capacityToBeAdded": 2
}

Response:

{
  "instanceId": "string",
  "resourceId": "string",
  "resourceAlias": "string"
}

Remove capacity

Removes capacity units from a Resource.

Endpoint: POST /resource/{resourceAlias}/capacity/remove

Request body:

{
  "capacityToBeRemoved": 1
}

Response:

{
  "instanceId": "string",
  "resourceId": "string",
  "resourceAlias": "string"
}

Implementation best practices

Implement cooldown periods

Avoid rapid successive scaling operations by implementing a cooldown period between scaling actions. A recommended cooldown period is 5 minutes (300 seconds).

Last Scale Action → Wait 5 minutes → Next Scale Action

Wait for ACTIVE state

Always wait for a Resource to reach ACTIVE state before performing the next scaling operation:

  1. Check current status via GET capacity endpoint
  2. If status is STARTING, poll until it becomes ACTIVE
  3. If status is FAILED, handle the error appropriately
  4. Only proceed with scaling when status is ACTIVE

Implement step-based scaling

Scale gradually by adding or removing a fixed number of units per operation:

  • Start with small steps (e.g., 1-2 units)
  • Repeat operations until reaching target capacity
  • Allow cooldown between steps

Respect scaling limits

Always validate that your scaling requests stay within the minReplicas and maxReplicas limits you defined in the Resource specification. Attempting to scale beyond these limits will fail. Your custom controller should:

  • Check current capacity before calculating scaling targets
  • Ensure target capacity is between minReplicas and maxReplicas
  • Handle edge cases when already at minimum or maximum capacity

Use retries with exponential backoff

Network requests can fail temporarily. Implement retry logic:

  • Retry failed requests
  • Use exponential backoff
  • Set appropriate timeouts

Handle errors gracefully

  • Check HTTP status codes (expect 200 for success)
  • Parse error responses
  • Log errors for debugging

Example implementation

For a complete working example of custom autoscaling, refer to the Custom Auto Scaling Example repository. This repository provides:

  • An example Go implementation with cooldown management and state handling
  • HTTP API for triggering scaling operations and checking status
  • Docker containerization for easy deployment
  • Complete implementation examples in multiple programming languages
  • Common scaling patterns and best practices

Querying metrics with Prometheus endpoint

When you deploy a SaaS Product with Omnistrate, Prometheus endpoint is automatically provided. You can use the endpoint to query system and application metrics.

The metrics endpoint is available at:

http://127.0.0.1:49751/metrics

Note

The Prometheus endpoint is only available when running on Omnistrate. It is not available for local development outside of the Omnistrate platform.

You can use Prometheus client libraries available for your programming language (such as the official Prometheus client libraries for Go, Java, Python, Ruby, and others) to query the endpoint.

Example metrics output

You can query the metrics endpoint using curl or any HTTP client:

curl http://127.0.0.1:49751/metrics

The endpoint returns metrics in Prometheus text exposition format:

# HELP cpu_usage Current CPU usage
# TYPE cpu_usage gauge
cpu_usage{customer_visible="true",service_provider_visible="true"} 7.629704984760052
# HELP disk_ops_per_sec Current disk IOPS
# TYPE disk_ops_per_sec gauge
disk_ops_per_sec{customer_visible="true",disk="/app/storage",service_provider_visible="true",type="read"} 0
disk_ops_per_sec{customer_visible="true",disk="/app/storage",service_provider_visible="true",type="write"} 0
# HELP disk_throughput_bytes_per_sec Disk throughput in bytes per second
# TYPE disk_throughput_bytes_per_sec gauge
disk_throughput_bytes_per_sec{customer_visible="true",disk="/app/storage",service_provider_visible="true",type="read"} 0
disk_throughput_bytes_per_sec{customer_visible="true",disk="/app/storage",service_provider_visible="true",type="write"} 0
# HELP disk_usage_percent Current disk usage as a percentage of total disk space
# TYPE disk_usage_percent gauge
disk_usage_percent{customer_visible="true",path="/app/storage",service_provider_visible="true"} 0.006691028941233667
disk_usage_percent{customer_visible="true",path="/var/log/app",service_provider_visible="true"} 0.1497328281402588
# HELP load_avg Load average
# TYPE load_avg gauge
load_avg{customer_visible="true",period="15min",service_provider_visible="true"} 0
load_avg{customer_visible="true",period="1min",service_provider_visible="true"} 0
load_avg{customer_visible="true",period="5min",service_provider_visible="true"} 0
# HELP mem_total_bytes Total memory in bytes
# TYPE mem_total_bytes gauge
mem_total_bytes{customer_visible="true",service_provider_visible="true"} 4.294967296e+09
# HELP mem_usage_bytes Current memory usage in bytes
# TYPE mem_usage_bytes gauge
mem_usage_bytes{customer_visible="true",service_provider_visible="true"} 1.652563968e+09
# HELP mem_usage_percent Current memory usage as a percentage of total memory
# TYPE mem_usage_percent gauge
mem_usage_percent{customer_visible="true",service_provider_visible="true"} 38.47675323486328
# HELP system_uptime_seconds System uptime in seconds
# TYPE system_uptime_seconds gauge
system_uptime_seconds{customer_visible="true",service_provider_visible="true"} 6725

Available metrics

The metrics endpoint provides the following system metrics:

  • cpu_usage: Current CPU usage percentage
  • disk_ops_per_sec: Disk IOPS for read and write operations
  • disk_throughput_bytes_per_sec: Disk throughput in bytes per second
  • disk_usage_percent: Disk usage as a percentage of total disk space
  • load_avg: System load average (1min, 5min, 15min periods)
  • mem_total_bytes: Total available memory in bytes
  • mem_usage_bytes: Current memory usage in bytes
  • mem_usage_percent: Memory usage as a percentage of total memory
  • system_uptime_seconds: System uptime in seconds

Using metrics for custom autoscaling

You can incorporate these metrics into your custom autoscaling logic by:

  1. Periodically querying the metrics endpoint from your custom controller
  2. Parsing the Prometheus text format to extract relevant metric values
  3. Implementing scaling decisions based on metric thresholds or patterns
  4. Combining multiple metrics for sophisticated scaling logic

For example, you might scale up when CPU usage exceeds 80% and memory usage exceeds 70% simultaneously, or implement predictive scaling based on historical metric patterns.