Build with Kubernetes Operators¶

Use this guide when your product is already managed by a Kubernetes Operator and you want Omnistrate to turn it into a customer-facing SaaS Product.

Operators are a good fit when your application lifecycle is already expressed as Kubernetes custom resources: database clusters, message queues, streaming platforms, storage systems, AI platforms, or any product where an operator reconciles the desired state. Omnistrate keeps that operator model intact and adds the control plane around it: cloud account onboarding, tenant management, APIs, Customer Portal, subscriptions, deployment cells, networking, backups, restores, upgrades, observability, and fleet operations.

For a complete working example, use the operator spec template. It builds a CloudNativePG-based PostgreSQL service plan with create, modify, start, stop, scale, backup, restore, and delete-backup system workflows.

What You Define¶

An operator-backed service plan usually has four layers:

Layer	Where it is defined	Purpose
Customer inputs	`apiParameters`	Values your customers provide, such as instance type, replica count, storage size, database name, or backup bucket.
Operator installation	`operatorCRDConfiguration.helmChartDependencies` or deployment-cell amenities	Installs the operator and required CRDs before tenant resources are reconciled.
Runtime lifecycle	`systemWorkflows`	Uses Argo Workflow-style DAGs to create, update, delete, start, stop, scale, back up, restore, and delete backups.
Provider operations	Optional `customWorkflows`	Exposes provider-defined actions that are not platform lifecycle APIs, such as compact, repair, diagnostics, or product-specific administrative tasks.

Note

operatorCRDConfiguration.template, operatorCRDConfiguration.supplementalFiles, and operatorCRDConfiguration.readinessConditions are deprecated for operator lifecycle management. Prefer systemWorkflows because they let you model multi-step lifecycle operations, capture outputs, run backups and restores, and use Kubernetes resource success and failure conditions with the same workflow engine. Keep operatorCRDConfiguration.helmChartDependencies when the service plan should install the operator and CRDs.

Minimal Service Plan Shape¶

The operator spec template starts with a normal service plan:

name: Postgres Operator
deployment:
  hostedDeployment:
    awsAccountId: "<AWS_ACCOUNT_ID>"
    awsBootstrapRoleAccountArn: "arn:aws:iam::<AWS_ACCOUNT_ID>:role/omnistrate-bootstrap-role"
tenancyType: CUSTOM_TENANCY

services:
  - name: CNPG
    compute:
      instanceTypes:
        - apiParam: instanceType
          cloudProvider: aws
    apiParameters:
      - key: instanceType
        name: Instance Type
        type: String
        modifiable: true
        defaultValue: "t3.medium"
      - key: numberOfInstances
        name: Total Number of Instances
        type: Float64
        modifiable: true
        defaultValue: "1"
      - key: storageSize
        name: Storage Size
        type: String
        modifiable: true
        defaultValue: "20Gi"

Add endpoint configuration for customer-facing connection details:

endpointConfiguration:
  writer:
    host: "$sys.network.externalClusterEndpoint"
    ports:
      - 5432
    primary: true
    networkingType: PUBLIC
  reader:
    host: "reader-{{ $sys.network.externalClusterEndpoint }}"
    ports:
      - 5432
    primary: false
    networkingType: PUBLIC

Install the operator as a Helm dependency when the operator version should be tied to the service plan version:

operatorCRDConfiguration:
  helmChartDependencies:
    - chartName: cloudnative-pg
      chartVersion: 0.28.2
      chartRepoName: cnpg
      chartRepoURL: https://cloudnative-pg.github.io/charts
    - chartName: plugin-barman-cloud
      chartVersion: 0.6.0
      chartRepoName: cnpg
      chartRepoURL: https://cloudnative-pg.github.io/charts

Note

The current operator template intentionally omits the deprecated operatorCRDConfiguration.template, supplementalFiles, and readinessConditions fields. Lifecycle resources, readiness, and outputs are modeled in systemWorkflows instead.

If the operator is shared by many tenant instances in the same deployment cell, install it as a deployment-cell amenity instead. That lets you upgrade the operator once per cluster instead of coupling it to every tenant instance upgrade.

Backup Capability¶

Backup policy belongs to the resource capability, not the workflow definition:

capabilities:
  backupConfiguration:
    backupRetentionInDays: 1
    backupPeriodInHours: 1
    snapshotBeforeDeletion: true

Omnistrate uses this configuration to schedule automated backups, set expiration for automated backups, run backup-before-delete when enabled, and trigger delete-backup cleanup for expired snapshots. Manual snapshots are user-controlled and are not expired by this schedule.

System Workflows¶

systemWorkflows are lifecycle hooks invoked by Omnistrate's existing platform APIs. They use an Argo Workflow-style structure: entrypoint, arguments.parameters, templates, DAG tasks, and Kubernetes resource templates.

The operator template uses the following system workflows:

Workflow	Trigger	Example behavior
`create`	Instance provisioning	Create secrets, create backup object-store configuration, apply the CNPG `Cluster`, and wait for ready instances.
`modify`	Instance update or upgrade	Reapply the `Cluster` with updated inputs such as storage or replica count.
`delete`	Instance delete	Delete the `Cluster`, secrets, and backup object-store resources.
`start`	Start API	Patch the CNPG hibernation annotation to `off`.
`stop`	Stop API	Patch the CNPG hibernation annotation to `on`.
`backup`	Manual backup, periodic backup, backup-before-delete	Rehydrate the cluster if needed, create a CNPG `Backup` CR, and persist backup metadata.
`restore`	Restore API	Create a new target cluster from selected snapshot metadata.
`deleteBackup`	Manual snapshot delete and retention cleanup	Delete the operator backup CR or external backup marker.

At minimum, operator-backed service plans should define create, modify, and delete so Omnistrate can provision, update, and remove the managed resource through standard lifecycle APIs. Add the other system workflows only for the operations your service plan supports.

Create workflow example¶

This simplified example shows the shape of a create workflow. The full template includes secrets, backup object stores, load balancer annotations, affinity, and output parameters.

systemWorkflows:
  create:
    outputParameters:
      postgresContainerImage: "$tasks.applycluster.resource.status.image"
      status: "$tasks.applycluster.resource.status.phase"
      topology: "$tasks.applycluster.resource.status.topology"
    workflow:
      entrypoint: create
      arguments:
        parameters:
          - name: namespace
            value: "{{ $sys.namespace }}"
          - name: instanceId
            value: "{{ $sys.instanceId }}"
          - name: numberOfInstances
            value: "{{ $var.numberOfInstances }}"
          - name: storageSize
            value: "{{ $var.storageSize }}"
      templates:
        - name: create
          inputs:
            parameters:
              - name: namespace
              - name: instanceId
              - name: numberOfInstances
              - name: storageSize
          dag:
            tasks:
              - name: applycluster
                template: apply-cluster
                arguments:
                  parameters:
                    - name: namespace
                      value: "{{inputs.parameters.namespace}}"
                    - name: instanceId
                      value: "{{inputs.parameters.instanceId}}"
                    - name: numberOfInstances
                      value: "{{inputs.parameters.numberOfInstances}}"
                    - name: storageSize
                      value: "{{inputs.parameters.storageSize}}"
        - name: apply-cluster
          inputs:
            parameters:
              - name: namespace
              - name: instanceId
              - name: numberOfInstances
              - name: storageSize
          resource:
            action: apply
            successCondition: status.instances == {{inputs.parameters.numberOfInstances}} && status.readyInstances == {{inputs.parameters.numberOfInstances}}
            failureCondition: status.phase == failed
            manifest: |
              apiVersion: postgresql.cnpg.io/v1
              kind: Cluster
              metadata:
                name: "{{inputs.parameters.instanceId}}"
                namespace: "{{inputs.parameters.namespace}}"
              spec:
                instances: {{inputs.parameters.numberOfInstances}}
                storage:
                  size: "{{inputs.parameters.storageSize}}"
                  storageClass: gp3

The resource.action can be apply, patch, or delete. successCondition and failureCondition are evaluated against the live Kubernetes resource status. Output parameters can reference completed task resources through $tasks.<taskName>.resource.*.

Backup and Restore Workflow Context¶

Backup workflows receive Omnistrate snapshot context:

systemWorkflows:
  backup:
    outputParameters:
      backupId: "$tasks.createBackupCR.resource.status.backupId"
      backupName: "$tasks.createBackupCR.resource.status.backupName"
    workflow:
      arguments:
        parameters:
          - name: snapshotId
            value: "{{ $sys.snapshot.id }}"
          - name: snapshotTime
            value: "{{ $sys.snapshot.time }}"
          - name: namespace
            value: "{{ $sys.namespace }}"
          - name: instanceId
            value: "{{ $sys.instanceId }}"

Values rendered from backup.outputParameters are stored as snapshot metadata. Restore workflows can use that metadata:

systemWorkflows:
  restore:
    workflow:
      arguments:
        parameters:
          - name: restoreSnapshotId
            value: "{{ $sys.restore.snapshotId }}"
          - name: restoreSnapshotTime
            value: "{{ $sys.restore.snapshotTime }}"
          - name: backupId
            value: "{{ $sys.restore.metadata.backupId }}"
          - name: backupName
            value: "{{ $sys.restore.metadata.backupName }}"
          - name: sourceInstanceId
            value: "{{ $sys.sourceInstanceId }}"
          - name: newClusterName
            value: "{{ $sys.targetInstanceId }}"

This keeps provider backup identifiers out of the customer restore form. Customers select an Omnistrate snapshot; the restore workflow receives the metadata captured during the original backup.

Custom Workflows¶

Use customWorkflows for provider-defined operations that are not platform lifecycle APIs. The operator template does not include a custom workflow by default; it focuses on the standard lifecycle operations implemented as systemWorkflows.

Add custom workflows only when your product needs an additional operation beyond create, modify, delete, start, stop, scale, backup, restore, or delete-backup. Custom workflows appear in supportedOperations for the instance and can be invoked from the UI, API, or omnistrate-ctl operation commands.

Argo Workflow Syntax¶

The workflow body follows the Argo Workflow model for DAGs and Kubernetes resource templates. The most important concepts are:

entrypoint selects the template to run first.
arguments.parameters binds Omnistrate variables to workflow inputs.
templates[].dag.tasks declares the task graph and dependencies.
templates[].resource applies, patches, or deletes Kubernetes resources.
successCondition and failureCondition define how Omnistrate decides whether a resource task completed.

For Argo syntax details, see the Argo Workflows documentation. Omnistrate renders $sys.*, $var.*, $secret.*, and $func.* expressions before executing the workflow against the tenant instance context.

Optional Terraform Dependencies¶

Operators often need cloud resources that are not Kubernetes resources, such as object-storage buckets, IAM roles, encryption keys, private networking, or DNS zones. Model those with a Terraform resource in the same service plan, mark it internal, and make the operator resource depend on it.

The Terraform resource can export outputs, and the operator workflow can consume those outputs through normal Omnistrate parameters. This lets Terraform manage cloud primitives while the operator manages application lifecycle in Kubernetes.

Build and Test¶

Clone the template and build it with omnistrate-ctl:

git clone https://github.com/omnistrate-community/operator-spec-template
cd operator-spec-template
omnistrate-ctl build -f spec.yaml --name "Postgres Operator" --release-as-preferred --spec-type ServicePlanSpec

Then create a test instance from the Customer Portal or CLI. Validate:

The operator Helm dependencies are installed.
The tenant namespace is created.
The create workflow applies the operator custom resources.
Endpoints are visible after the operator reports readiness.
Manual backup creates a snapshot and stores provider metadata.
Restore creates a new target instance from the selected snapshot.
Start, stop, add capacity, and remove capacity drive the operator through its CRD.

Use Deployment Cell Access and Manage Workflows to debug Kubernetes resources and workflow events.

For the broader architecture and integration principles, see Beyond Operators: The Enterprise Control Plane Layer on the Omnistrate blog.