Skip to content

Monitoring with auto-recovery

High availability is a critical component of your SaaS and it requires several measures to achieve high availability.

In general, Omnistrate provides full support for your control plane, data plane (aka application) infrastructure and automated L1 support for your data plane failures.

SaaS monitoring

Control plane failures

We will monitor, detect and recover any failures in your control plane to give you a 99.99% SLA.

Data plane infrastructure failures

Omnistrate automatically detects the following failures and seamlessly recovers from them:

  • Dead Process(es)
  • Machine failures
  • Network partitions
  • Degraded storage
  • Zonal failures

If we notice these failures, we try basic recovery mechanisms, ex - restart the process or machine or replacing the machine. If we can't recover, we will alert your team with the configured mechanism to look into it further.

Data plane non-infra issues

In addition, Omnistrate provides mechanisms for you to detect and recover from process failures using healthcheck actionhook.

In order to configure healthcheck actionhook, you can provide a check that we can use to validate the health of the process on a regular basis.

As an example, let's say you want to verify liveness for your database application. You can perform a read after write query and make sure database is making progress.

Note that you can specify different checks in the same healthcheck to make sure all components of your application are up and running. Let's say you also want to add a simple verification check to verify that your process is up and running, you can add a check using ps utility in addition to the above liveness check.

If your process health check is failing, we will alert your team with relevant details to look into it.

For more details on actionhooks, please see this

Note

There are ops automation tools like Shoreline which can further automate data plane operations with well-defined runbooks. To learn more about integrations, please see this