Vector database example using PostgreSQL¶
In this example exercise, we will build PostgreSQL SaaS with pgvector extension. Let's start simple and we will extend the offering incrementally:
Hello world PostgreSQL SaaS¶
Here is a hello world version of PostgreSQL:
version: "3"
services:
Database:
image: 'bitnami/postgresql:latest'
ports:
- 5432:5432
volumes:
- ./data:/var/lib/postgresql/data
environment:
- POSTGRESQL_PASSWORD=abc123
- POSTGRESQL_DATABASE=testdb
- POSTGRESQL_USERNAME=root
- POSTGRESQL_POSTGRES_PASSWORD=rootpassword12345
- POSTGRESQL_PGAUDIT_LOG=READ,WRITE
- POSTGRESQL_LOG_HOSTNAME=true
- POSTGRESQL_REPLICATION_MODE=master
- POSTGRESQL_REPLICATION_USER=repl_user
- POSTGRESQL_REPLICATION_PASSWORD=repl_password
- POSTGRESQL_DATA_DIR=/var/lib/postgresql/data/dbdata
- SECURITY_CONTEXT_USER_ID=1001
- SECURITY_CONTEXT_FS_GROUP=1001
- SECURITY_CONTEXT_GROUP_ID=0
Let's identify some of the challenges in the above setup:
- Choose the cloud provider account to host dataplane (i.e. customer databases)
- Missing metrics and logging support
- Missing metering support
- Single master postgres i.e. no support for replicas
- Missing pgvector extension
- Allow your customers to customize their database instance
- Missing cloud-native capabilities like multi zone, endpoint per replica, autoscaling based on custom metrics etc
Now, let's address each of the above gaps to achieve the desired outcome.
Host application (database) in your own account¶
Add the following to the compose specification to deploy it in your account:
x-omnistrate-my-account:
AwsAccountId: '123456789012' # random account number
AwsBootstrapRoleAccountArn: 'arn:aws:iam::123456789012:role/omnistrate-bootstrap-role'
GcpProjectId: 'omnistrate-example'
GcpProjectNumber: '1234567890123' # random account number
GcpServiceAccountEmail: '[email protected]'
Note
Please don't forget to replace the account numbers, project id and other information with your own account information
To learn more about deploying in your customers account, see this page.
You can also enable both and that will build separate service plans that your customers can choose between. As an example, startups segment maybe okay deploying in your account but bigger enterprises prefer to deploy your SaaS in their account.
Metrics and logging support¶
Add the following to the compose spec:
To learn more about metrics/logging integrations, please see this page
Metering support¶
Add the following to x-omnistrate-integrations
tag in the compose spec:
To learn more about metering and other integrations, please see this page
Summary so far¶
Here is how PostgreSQL SaaS in your account w/ integrations looks like so far:
version: "3"
x-omnistrate-my-account:
AwsAccountId: '123456789012'
AwsBootstrapRoleAccountArn: 'arn:aws:iam::123456789012:role/omnistrate-bootstrap-role'
GcpProjectId: 'omnistrate-example'
GcpProjectNumber: '1234567890123'
GcpServiceAccountEmail: '[email protected]'
x-omnistrate-integrations:
- omnistrateLogging
- omnistrateMetrics
- omnistrateMetering
services:
Database:
image: 'bitnami/postgresql:latest'
ports:
- 5432:5432
volumes:
- ./data:/var/lib/postgresql/data
environment:
- POSTGRESQL_PASSWORD=abc123
- POSTGRESQL_DATABASE=testdb
- POSTGRESQL_USERNAME=root
- POSTGRESQL_POSTGRES_PASSWORD=rootpassword12345
- POSTGRESQL_PGAUDIT_LOG=READ,WRITE
- POSTGRESQL_LOG_HOSTNAME=true
- POSTGRESQL_REPLICATION_MODE=master
- POSTGRESQL_REPLICATION_USER=repl_user
- POSTGRESQL_REPLICATION_PASSWORD=repl_password
- POSTGRESQL_DATA_DIR=/var/lib/postgresql/data/dbdata
- SECURITY_CONTEXT_USER_ID=1001
- SECURITY_CONTEXT_FS_GROUP=1001
- SECURITY_CONTEXT_GROUP_ID=0
Add replicas support¶
To add the replica support, we need to:
- Add a replica service component to configure master and replica resources separately. Note that their configuration are different even though they both run the same base image.
- Add numReadReplicas API parameter and configure replicaCountAPIParam compute infrastructure setting with that parameter. This change will allow us to launch number of replicas based on the numReadReplicas value provided by your customers.
services:
Replica:
.....
x-omnistrate-compute:
replicaCountAPIParam: numReadReplicas
x-omnistrate-api-params:
- key: numReadReplicas
description: Number of Read Replicas
name: Number of Read Replicas
type: Float64
modifiable: true
required: true
export: true
To learn more about API parameters, please see this page
To learn more about service component dependencies, please see this page
Enable pgvector extension¶
To enable pgvector extension, we will take advantage of custom code injection through action hooks.
x-omnistrate-actionhooks:
- scope: CLUSTER
type: INIT
commandTemplate: >
PGPASSWORD=rootpassword12345 psql -U postgres
-h master testdb -c "create extension vector"
To learn more about action hooks, please see this page
Add customization for your customers¶
To enable customization, we will take advantange of API parameters to configure different PostgreSQL and infrastructure settings.
As an example, let's say we want our users to specify the Master instance type:
x-omnistrate-api-params:
- key: masterInstanceType
description: Master Instance Type
name: Master Instance Type
type: String
modifiable: true
required: true
export: true
To learn more about SaaS configuration, please see this page
Add SaaS capabilities¶
To add SaaS capabilities mentioned above:
x-omnistrate-capabilities:
enableMultiZone: true
enableEndpointPerReplica: true
autoscaling:
maxReplicas: 5
minReplicas: 1
idleMinutesBeforeScalingDown: 2
idleThreshold: 20
overUtilizedMinutesBeforeScalingUp: 3
overUtilizedThreshold: 80
serverlessConfiguration:
targetPort: 5432
enableAutoStop: true
minimumNodesInPool: 5
For full list of capabilities, see this page To learn more on serverless, please refer to the guide here
VectorDB Serverless SaaS¶
Here is the final VectorDB SaaS that one can use to generate the initial SaaS:
version: "3"
x-omnistrate-integrations:
- omnistrateLogging
- omnistrateMetrics
services:
Master:
image: 'omnistrate/pgvector:c227409'
ports:
- 5432:5432
volumes:
- ./data:/var/lib/postgresql/data
x-omnistrate-storage:
aws:
instanceStorageType: AWS::EBS_GP3
instanceStorageSizeGi: 100
instanceStorageIOPSAPIParam: 3000
instanceStorageThroughputAPIParam: 125
gcp:
instanceStorageType: GCP::PD_BALANCED
instanceStorageSizeGi: 100
x-omnistrate-compute:
instanceTypes:
- cloudProvider: aws
apiParam: masterInstanceType
- cloudProvider: gcp
apiParam: masterInstanceType
x-omnistrate-capabilities:
autoscaling:
maxReplicas: 1
minReplicas: 1
idleMinutesBeforeScalingDown: 2
idleThreshold: 20
overUtilizedMinutesBeforeScalingUp: 3
overUtilizedThreshold: 80
serverlessConfiguration:
targetPort: 5432
enableAutoStop: true
minimumNodesInPool: 5
environment:
- POSTGRESQL_PASSWORD=$parameterGroup.var.postgresqlPassword
- POSTGRESQL_DATABASE=$parameterGroup.var.postgresqlDatabase
- POSTGRESQL_USERNAME=$parameterGroup.var.postgresqlUsername
- POSTGRESQL_POSTGRES_PASSWORD=$parameterGroup.var.postgresqlRootPassword
- POSTGRESQL_PGAUDIT_LOG=READ,WRITE
- POSTGRESQL_LOG_HOSTNAME=true
- POSTGRESQL_REPLICATION_MODE=master
- POSTGRESQL_REPLICATION_USER=$parameterGroup.var.replUsername
- POSTGRESQL_REPLICATION_PASSWORD=$parameterGroup.var.replPassword
- POSTGRESQL_DATA_DIR=/var/lib/postgresql/data/dbdata
- DATA_SOURCE_NAME=postgresql://{{ $parameterGroup.var.postgresqlUsername }}:{{ $parameterGroup.var.postgresqlPassword }}@localhost:5432/{{ $parameterGroup.var.postgresqlDatabase }}?sslmode=disable
- SECURITY_CONTEXT_USER_ID=1001
- SECURITY_CONTEXT_FS_GROUP=1001
- SECURITY_CONTEXT_GROUP_ID=0
x-omnistrate-api-params:
- key: masterInstanceType
description: Master Instance Type
name: Master Instance Type
type: String
modifiable: true
required: true
export: true
- key: parameterGroupId
description: Replica configuration
name: Parameter Group Id
dependentResourceKey: ParameterGroup
type: Resource
modifiable: true
required: true
export: true
x-omnistrate-actionhooks:
- scope: CLUSTER
type: INIT
commandTemplate: >
PGPASSWORD={{ $parameterGroup.var.postgresqlRootPassword }} psql -U postgres
-h master {{ $parameterGroup.var.postgresqlDatabase }} -c "create extension vector"
Replica:
image: 'omnistrate/pgvector:c227409'
ports:
- 5433:5432
volumes:
- ./data:/var/lib/postgresql/data
x-omnistrate-compute:
replicaCountAPIParam: numReadReplicas
instanceTypes:
- cloudProvider: aws
apiParam: replicaInstanceType
- cloudProvider: gcp
apiParam: replicaInstanceType
x-omnistrate-capabilities:
enableMultiZone: true
endpointPerReplica: true
autoscaling:
maxReplicas: 5
minReplicas: 1
idleMinutesBeforeScalingDown: 2
idleThreshold: 20
overUtilizedMinutesBeforeScalingUp: 3
overUtilizedThreshold: 80
serverlessConfiguration:
targetPort: 5432
enableAutoStop: true
minimumNodesInPool: 5
environment:
- POSTGRESQL_PASSWORD=$parameterGroup.var.postgresqlPassword
- POSTGRESQL_MASTER_HOST=$master.sys.network.externalClusterServerlessEndpoint.endpointName
- POSTGRESQL_PGAUDIT_LOG=READ,WRITE
- POSTGRESQL_LOG_HOSTNAME=true
- POSTGRESQL_REPLICATION_MODE=slave
- POSTGRESQL_REPLICATION_USER=$parameterGroup.var.replUsername
- POSTGRESQL_REPLICATION_PASSWORD=$parameterGroup.var.replPassword
- POSTGRESQL_MASTER_PORT_NUMBER=$master.sys.network.externalClusterServerlessEndpoint.openPorts[0]
- POSTGRESQL_DATA_DIR=/var/lib/postgresql/data/dbdata
- SECURITY_CONTEXT_USER_ID=1001
- SECURITY_CONTEXT_FS_GROUP=1001
- SECURITY_CONTEXT_GROUP_ID=0
x-omnistrate-api-params:
- key: replicaInstanceType
description: Replica Instance Type
name: Replica Instance Type
type: String
modifiable: true
required: true
export: true
- key: masterInstanceId
description: Instance Id of the Master to connect
name: Master Instance Id
dependentResourceKey: Master
type: Resource
modifiable: true
required: true
export: true
- key: parameterGroupId
description: Replica configuration
name: Parameter Group Id
dependentResourceKey: ParameterGroup
type: Resource
modifiable: true
required: true
export: true
- key: numReadReplicas
description: Number of Read Replicas
name: Number of Read Replicas
type: Float64
modifiable: true
required: false
export: true
defaultValue: "1"
limits:
min: 1
max: 10
ParameterGroup:
image: omnistrate/noop
x-omnistrate-api-params:
- key: postgresqlUsername
description: Default DB Username
name: Username
type: String
modifiable: false
required: true
export: true
- key: postgresqlPassword
description: Default DB Password
name: Password
type: String
modifiable: false
required: true
export: false
- key: postgresqlDatabase
description: Default DB Name
name: Default Database
type: String
modifiable: false
required: true
export: true
- key: postgresqlRootPassword
description: Root Password
name: Root DB Password
type: String
modifiable: false
required: true
export: false
- key: replUsername
description: Username
name: Replication Username
type: String
modifiable: false
required: true
export: true
- key: replPassword
description: Replication Password
name: Replication Password
type: String
modifiable: false
required: true
export: false
See it in action¶
Here is how it looks like:
Hopefully, this gave some idea on how one can take their data store technology and go from simply having a software to a fully-fledged working SaaS.
Of course, you can continue to evolve after building your first SaaS. For more on this, see here