PostgreSQL on Kube360

This guide documents how to run PostgreSQL instances on the Kube360 cluster using the CloudNativePG operator.

Note: Kube360 currently designates two specific worker nodes for PostgreSQL workloads. Consequently, deploying clusters with more than 2 pods is not recommended without prior discussion with the Kube360 team.

Prerequisites: Object Store Configuration

Before creating a Postgres cluster, you must configure an Object Store for base backups and Write-Ahead Log (WAL) archiving. We use Cloudflare R2 for this purpose.

1. Infrastructure Setup (Terraform)

Ensure the corresponding Cloudflare R2 bucket exists with appropriate lifecycle rules.

  • Retention Policy: We configure a 35-day retention policy on the bucket. This acts as a safety net; if the backup tool (Barman) fails to prune old backups (configured for 30 days), Cloudflare will automatically delete objects older than 35 days.
  • Object Locking: To prevent accidental deletion by the backup tool, objects are locked for 29 days.

Example Terraform Configuration:

resource "cloudflare_r2_bucket" "satoshi_staging" {
  account_id    = local.account_id
  name          = "satoshi-staging-pg"
  location      = "ENAM"
  storage_class = "Standard"
}

resource "cloudflare_r2_bucket_lifecycle" "satoshi_staging" {
  account_id  = local.account_id
  bucket_name = cloudflare_r2_bucket.satoshi_staging.name
  rules = [{
    id = "Expire all objects older than 35 days"
    conditions = {
      prefix = ""
    }
    delete_objects_transition = {
      condition = {
        max_age = 3024000 # 35 days in seconds
        type    = "Age"
      }
    }
    enabled = true
  }]
}

# Protect backups from deletion for the first 29 days
resource "cloudflare_r2_bucket_lock" "satoshi_staging" {
  account_id  = local.account_id
  bucket_name = cloudflare_r2_bucket.satoshi_staging.name
  rules = [{
    id = "Lock all objects for 29 days"
    condition = {
      max_age_seconds = 2505600 # 29 days in seconds
      type = "Age"
    }
    enabled = true
    prefix = ""
  }]
}

Credentials Generation:

Generate an API token with write access to the specific bucket.

data "cloudflare_api_token_permission_groups_list" "bucket_write" {
  name = "Workers%20R2%20Storage%20Bucket%20Item%20Write"
}

resource "cloudflare_api_token" "pg_staging_api" {
  name = "pg_staging_api"
  policies = [
    {
      effect = "allow"
      permission_groups = [
        {
          id = data.cloudflare_api_token_permission_groups_list.bucket_write.result[0].id
        },
      ]
      resources = {
        "com.cloudflare.edge.r2.bucket.${local.account_id}_default_satoshi-staging-pg" = "*",
      },
    },
  ]
}

output "satoshi_staging_pg_api_secret_token" {
  description = "API Secret token key for Postgres Staging"
  value       = sha256(cloudflare_api_token.pg_staging_api.value)
  sensitive = true
}

output "satoshi_staging_pg_access_token_id" {
  description = "API token key ID"
  value       = cloudflare_api_token.pg_staging_api.id
}

Verify the credentials using the AWS CLI:

export AWS_ACCESS_KEY_ID="REDACTED"
export AWS_SECRET_ACCESS_KEY="REDACTED"
export AWS_DEFAULT_REGION="auto"

aws s3api list-objects --bucket bucket-name --endpoint-url https://<ACCOUNT_ID>.r2.cloudflarestorage.com

2. Kubernetes ObjectStore Manifest

Define the ObjectStore resource in Kubernetes. This configuration tells the operator where to store backups and provides the credentials.

apiVersion: barmancloud.cnpg.io/v1
kind: ObjectStore
metadata:
  name: satoshi-staging
spec:
  retentionPolicy: "30d"
  instanceSidecarConfiguration:
    resources:
      requests:
        memory: "512Mi"
        cpu: "250m"
      limits:
        memory: "512Mi"
        cpu: "500m"
  configuration:
    destinationPath: "s3://satoshi-staging-pg/store/"
    endpointURL: "https://b9b297472a828eda3e40c09e5b11a604.r2.cloudflarestorage.com"
    wal:
      compression: zstd
    data:
      compression: gzip
    s3Credentials:
      accessKeyId:
        name: cloudflare-r2-staging-pg
        key: AWS_ACCESS_KEY_ID
      secretAccessKey:
        name: cloudflare-r2-staging-pg
        key: AWS_SECRET_ACCESS_KEY

Postgres Cluster Configuration

With the Object Store configured, you can define the PostgreSQL Cluster.

Cluster Manifest

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: satoshi-staging
spec:
  instances: 2
  imageName: ghcr.io/cloudnative-pg/postgresql:18.1-system-trixie

  postgresql:
    parameters:
      shared_buffers: "256MB"
      effective_cache_size: "768MB"
      work_mem: "32MB"
      maintenance_work_mem: "512MB"
      archive_timeout: "300"

  bootstrap:
    initdb:
      database: indexer
      owner: indexer
      secret:
        name: indexer-postgres

  affinity:
    enablePodAntiAffinity: true
    topologyKey: kubernetes.io/hostname
    podAntiAffinityType: required
    nodeSelector:
      node.kubernetes.io/postgres: "true"
    tolerations:
    - key: node-role.kubernetes.io/postgres
      operator: Exists
      effect: NoSchedule

  resources:
    requests:
      memory: "1Gi"
      cpu: "1"
    limits:
      memory: "1Gi"
      cpu: "1"

  storage:
    size: 20Gi
    storageClass: longhorn-postgres-whole
  plugins:
  - name: barman-cloud.cloudnative-pg.io
    isWALArchiver: true
    parameters:
      barmanObjectName: satoshi-staging

Configuration Parameters Explained

Properly tuning PostgreSQL parameters based on your resource limits is crucial for performance and stability.

ParameterDescriptionRecommended Guideline
shared_buffersDetermines how much memory is dedicated to PostgreSQL for caching data.~25% of RAM.
For a 1Gi limit, 256MB is appropriate.
effective_cache_sizeAn estimate of the memory available for disk caching by the operating system and within the database itself. This helps the query planner estimate costs.~50% to 75% of RAM.
For a 1Gi limit, 768MB is appropriate.
work_memSpecifies the amount of memory to be used by internal sort operations and hash tables before writing to temporary disk files. Note: This is per operation/connection.Start low (e.g., 32MB-64MB). Setting this too high can cause Out-Of-Memory (OOM) errors if many connections run complex queries simultaneously.
maintenance_work_memSpecifies the maximum amount of memory to be used by maintenance operations, such as VACUUM, CREATE INDEX, and ALTER TABLE ADD FOREIGN KEY.~10% to 25% of RAM, up to 1GB.
Larger values improve maintenance speed.
archive_timeoutForces the server to switch to a new WAL segment file periodically.300 (5 minutes) is a standard default to ensure WAL files are pushed to object storage frequently even during low activity.

Important Implementation Notes

  1. Guaranteed QoS: Always set resources.requests equal to resources.limits. This ensures the Pod gets Guaranteed Quality of Service (QoS) and is less likely to be evicted.
  2. Scheduling: The nodeSelector and tolerations in the example ensure the pods are scheduled on the dedicated Postgres worker nodes.
  3. Secrets: Use Sealed Secrets to manage the database username and password referenced in the bootstrap.initdb.secret.

Database Management

Declarative Schema Management

Manage your database schema declaratively using the Database resource. We recommend creating a dedicated schema for your application rather than using the public schema.

apiVersion: postgresql.cnpg.io/v1
kind: Database
metadata:
  name: indexer
spec:
  name: indexer
  owner: indexer
  schemas:
    - name: indexer
      ensure: present
      owner: indexer
  cluster:
    name: satoshi-staging

Backup & Recovery

Scheduled Base Backups

Configure a ScheduledBackup resource to perform periodic base backups (e.g., weekly).

apiVersion: postgresql.cnpg.io/v1
kind: ScheduledBackup
metadata:
  name: satoshi-staging-base-backup
spec:
  immediate: true
  schedule: "0 0 0 * * 0" # Every Sunday at midnight
  backupOwnerReference: self
  method: plugin
  target: prefer-standby
  cluster:
    name: satoshi-staging
  pluginConfiguration:
    name: barman-cloud.cloudnative-pg.io

On-Demand Base Backups

To trigger a backup manually (e.g., after initial data load), use the Backup resource:

apiVersion: postgresql.cnpg.io/v1
kind: Backup
metadata:
  name: staging-pg-on-demand-one
spec:
  method: plugin
  pluginConfiguration:
    name: barman-cloud.cloudnative-pg.io
  cluster:
    name: satoshi-staging

You can verify the backup status by checking this resource. While the cnpg CLI supports imperative backups, the declarative approach is recommended for audit trails and GitOps workflows.

Monitoring

To enable Prometheus metrics collection, create a PodMonitor resource. This allows you to visualize cluster health in Grafana.

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: satoshi-staging-monitor
  labels:
    release: kube360
spec:
  selector:
    matchLabels:
      cnpg.io/cluster: satoshi-staging
  podMetricsEndpoints:
  - port: metrics