# GCP ## Deployment on GKE ### **⚠️ Reference Only** (If you are using Terraform) The [`highflame-iac`](https://github.com/highflame-ai/highflame-iac/blob/main/terraform/README.md) repository contains reference Terraform configurations for managing Highflame cloud resources. **Do not** copy these directly into your IaC code or run them against your cloud environment without first reviewing and patching them to match your account, networking, and security requirements. ### Prerequisites Ensure that the following tools and resources are installed and available: * Access to the GCP console * Set of domain names for Highflame services (Highflame Team will share the list of services that require the ingress) * GKE cluster with at least 6 worker nodes * Create a Global static IP and use it in the GCE ingress * Enable logging for GKE * GKE addons to be installed * Horizontal Pod Autoscaling * Http Load Balancing * GCE Persistent Disk CSI driver * Network Policy Config * Custom GCP service account and assign it to the GKE node pool with the following permissions * `logging.logEntries.create` * `logging.logEntries.route` * `monitoring.metricDescriptors.create` * `monitoring.metricDescriptors.get` * `monitoring.metricDescriptors.list` * `monitoring.monitoredResourceDescriptors.get` * `monitoring.monitoredResourceDescriptors.list` * `monitoring.timeSeries.create` * Create a default node pool for system nodes in the GKE cluster * Postgres Server from Cloud SQL * Memorystore for Redis * Helm v3 * Kubectl utility * All the cloud resources (managed services such as Postgres, Redis etc) should be in the same VPC, or those should be accessible from the Kubernetes Cluster ### Cloud Resources and Sizing | Highflame components | Cloud Resources | Size | | --------------------- | ------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Highflame services | Deploying the services into your GKE cluster with help of the helm charts |

CPU nodes: Setup atleast 3 nodes and the worker node type can be c2d-highcpu-8
GPU nodes: Setup atleast 4 nodes and the worker node type can be c2d-highcpu-8 with gpu nvidia-tesla-t4 enabled

| | Trascational Database | Cloud SQL Postgres server | The Postgres server can be single node or primary - secondary cluster and the db tier can be `db-custom-4-8192` | | Analytical Database | Clickhouse database server | Deploying into the GKE | | Cache | Memorystore for Redis | Redis Memory size can be `4` | | Object Store | GCS bucket | Highflame services required to store files and data in object store, such as Store the clickhouse db backup in the GCS bucket for DR as clickhouse is deploying into the GKE cluster | | Logging | Cloud logging | The GKE service logs can be pushed to the logging space with help of GKE addons | | Authentication | Clerk | External - Managed by Highflame | ### Highflame GCP Service Account #### Prerequisites Ensure that the following tools and resources are installed and available: * Access to the GKE cluster setup above * gcloud CLI #### Service Account 1. Setting up the environment vars ```bash export K8S_NAMESPACE="highflame-poc" ## Highflame k8s namespace export GCP_PROJECT_ID="" ## GCP project ID export GCP_SA_NAME="highflame-svc-sa" ## Name for the SA export K8S_SA_NAME="highflame-k8s-sa" ## Name for K8S service account ``` 2. Create a GCP service account ```bash gcloud iam service-accounts \ create ${GCP_SA_NAME} \ --display-name="Highflame Service Account" ``` 3. GCP Workload Identity ```bash gcloud iam service-accounts \ add-iam-policy-binding ${GCP_SA_NAME}@${GCP_PROJECT_ID}.iam.gserviceaccount.com \ --role roles/iam.workloadIdentityUser \ --member "serviceAccount:${GCP_PROJECT_ID}.svc.id.goog[${K8S_NAMESPACE}/${K8S_SA_NAME}]" ``` #### Resource Access - GCP GCS There will be a few GCS bucket requirements for HighFlame, and those were mentioned in the [highflame service variables](https://github.com/highflame-ai/highflame-iac/blob/main/docs/service-vars.md) list. 1. Grant the GCP service account access to GCS ```bash gcloud projects add-iam-policy-binding ${GCP_PROJECT_ID} \ --member="serviceAccount:${GCP_SA_NAME}@${GCP_PROJECT_ID}.iam.gserviceaccount.com" \ --role="roles/storage.objectAdmin" ``` ### Analytical Database setup - Clickhouse on GKE This document is the long-form runbook for installing ClickHouse on a GKE cluster using the [Altinity ClickHouse Operator](https://github.com/Altinity/clickhouse-operator), and a GCS bucket-backed [clickhouse-backup](https://github.com/Altinity/clickhouse-backup) sidecar. #### What gets deployed The full stack lives entirely inside a dedicated `clickhouse` namespace and consists of five logical components:

Component	Kind	Source	Notes
Altinity operator	Helm release `ch-operator`	Upstream chart	Watches `ClickHouseInstallation` (CHI) and `ClickHouseKeeperInstallation` (CHK) CRDs
Service account, ConfigMap, Secret	k8s resources	`highflame-clickhouse-deps.yml`	IRSA-annotated `clickhouse-sa`, `clickhouse-cm`, and `clickhouse-secrets`
ClickHouse Keeper	`ClickHouseKeeperInstallation/ch`	`highflame-clickhouse-values.yml`	1 replica, 20Gi PVC. Replaces ZooKeeper
ClickHouse cluster	`ClickHouseInstallation/ch`	`highflame-clickhouse-values.yml`	1 shard × 1 replica, 4 CPU / 12 Gi RAM, 512 Gi PVC, `clickhouse-backup` sidecar
Backup config	k8s resources	`highflame-clickhouse-backup-config.yml`	Backup config for GCS bucket integration
CronJob + Job	k8s resources	`highflame-clickhouse-addons-values.yml`	12-hourly remote backups, 60-backup retention; setup Job applies 3-day TTL on `system.*_log` tables

Pinned versions (verify before installing): * `Operator chart`: `0.25.5` * `clickhouse/clickhouse-server`: `25.10.2` * `clickhouse/clickhouse-keeper`: `25.10.2` * `altinity/clickhouse-backup`: `2.6.39` After installation, the in-cluster service endpoints are: * HTTP: `clickhouse-ch.clickhouse.svc.cluster.local:8123` * Native TCP: `clickhouse-ch.clickhouse.svc.cluster.local:9000` * Keeper: `keeper-ch.clickhouse.svc.cluster.local:2181` These match the `CLICKHOUSE_HOST` ConfigMap value and the `zookeeper.nodes` block in the CHI spec #### Prerequisites Before you start, you need: 1. **A GKE cluster** with `kubectl` context pointing at it (`kubectl config current-context`). 2. **A storage class** that supports `ReadWriteOnce` PVCs — `pd-ssd` is recommended. Both the Keeper and the ClickHouse server use the cluster default unless you uncomment `storageClassName` in the CHI/CHK templates. 3. **A GCS bucket** for remote backups, in the same GCP project as the cluster. Versioning + lifecycle rules are recommended but not required. 4. **A Service account for the backup Service Account (IRSA)** with permission to the GCS bucket `roles/storage.objectViewer` and `roles/iam.serviceAccountTokenCreator`. The SA must allow the OIDC provider of the cluster to assume it for `system:serviceaccount:clickhouse:clickhouse-sa`. 5. **`helm` ≥ 3.10** #### Config files {% hint style="info" %} highflame-clickhouse-deps.yml {% endhint %} ```bash apiVersion: v1 kind: ServiceAccount metadata: annotations: iam.gke.io/gcp-service-account: "" name: clickhouse-sa namespace: clickhouse --- apiVersion: v1 kind: ConfigMap metadata: name: clickhouse-cm namespace: clickhouse data: CLICKHOUSE_HOST: "clickhouse-ch.clickhouse.svc.cluster.local" BACKUP_KEEP: "60" CREATE_DB_LIST: "" BACKUP_DB_LIST: "highflame" --- apiVersion: v1 kind: Secret metadata: name: clickhouse-secrets namespace: clickhouse type: Opaque stringData: CH_ADMIN_USERNAME: "highflame_admin" CH_ADMIN_PASSWORD: "" CH_READONLY_USERNAME: "highflame_readonly" CH_READONLY_PASSWORD: "" CH_BACKUP_USERNAME: "highflame_backup" CH_BACKUP_PASSWORD: "" ``` Defines: * `ServiceAccount/clickhouse-sa` — annotation `iam.gke.io/gcp-service-account` must be set to the SA created in the prerequisites. * `ConfigMap/clickhouse-cm` — `CLICKHOUSE_HOST`, `BACKUP_KEEP=60`, `CREATE_DB_LIST` (comma-separated DBs to create on first boot), `BACKUP_DB_LIST=highflame` (comma-separated DBs to back up). * `Secret/clickhouse-secrets` — admin / readonly / backup usernames + passwords. You must edit: | Field | Required value | | -------------------------------- | ------------------------------------------------------ | | `iam.gke.io/gcp-service-account` | IRSA Service account (SA) | | `CH_ADMIN_PASSWORD` | strong password for admin user `highflame_admin` | | `CH_READONLY_PASSWORD` | strong password for readonly user `highflame_readonly` | | `CH_BACKUP_PASSWORD` | strong password for backup user `highflame_backup` | {% hint style="info" %} highflame-clickhouse-values.yml {% endhint %} Defines the `ClickHouseKeeperInstallation/ch` and `ClickHouseInstallation/ch` custom resources. ```bash apiVersion: "clickhouse-keeper.altinity.com/v1" kind: "ClickHouseKeeperInstallation" metadata: name: "ch" namespace: clickhouse spec: configuration: clusters: - name: cluster layout: replicasCount: 1 settings: keeper_server/tcp_port: "2181" keeper_server/raft_port: "9444" defaults: templates: podTemplate: keeper volumeClaimTemplate: keeper templates: podTemplates: - name: keeper metadata: labels: app: clickhouse-keeper spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: "app" operator: In values: - clickhouse-keeper topologyKey: "kubernetes.io/hostname" containers: - name: keeper image: "clickhouse/clickhouse-keeper:25.10.2" command: [ "/usr/bin/clickhouse-keeper", "start" ] resources: requests: memory: "512Mi" cpu: "500m" limits: memory: "1024Mi" cpu: "1000m" securityContext: fsGroup: 101 volumeClaimTemplates: - name: keeper spec: # storageClassName: storage_classname accessModes: - ReadWriteOnce resources: requests: storage: 20Gi --- apiVersion: "clickhouse.altinity.com/v1" kind: "ClickHouseInstallation" metadata: name: "ch" namespace: clickhouse spec: configuration: zookeeper: nodes: - host: "keeper-ch.clickhouse.svc.cluster.local" port: 2181 clusters: - name: cluster layout: shardsCount: 1 replicasCount: 1 users: highflame_admin/password: valueFrom: secretKeyRef: name: clickhouse-secrets key: CH_ADMIN_PASSWORD highflame_admin/networks/ip: - "0.0.0.0/0" - "::/0" highflame_admin/profile: default highflame_admin/quota: default highflame_readonly/password: valueFrom: secretKeyRef: name: clickhouse-secrets key: CH_READONLY_PASSWORD highflame_readonly/networks/ip: - "0.0.0.0/0" - "::/0" highflame_readonly/profile: readonly highflame_readonly/quota: readonly highflame_backup/password: valueFrom: secretKeyRef: name: clickhouse-secrets key: CH_BACKUP_PASSWORD highflame_backup/networks/ip: - "0.0.0.0/0" - "::/0" highflame_backup/profile: backup highflame_backup/quota: backup profiles: readonly/readonly: 2 readonly/max_execution_time: 120 readonly/max_memory_usage: 1000000000 readonly/max_rows_to_read: 100000000 readonly/max_result_rows: 10000 readonly/read_overflow_mode: throw readonly/result_overflow_mode: throw readonly/timeout_overflow_mode: throw backup/readonly: 0 quotas: readonly/interval/duration: 3600 backup/interval/duration: 3600 defaults: templates: podTemplate: clickhouse dataVolumeClaimTemplate: clickhouse-data # logVolumeClaimTemplate: clickhouse-log templates: podTemplates: - name: clickhouse spec: serviceAccountName: clickhouse-sa containers: - name: clickhouse image: clickhouse/clickhouse-server:25.10.2 resources: requests: cpu: "4" memory: "12Gi" limits: cpu: "4" memory: "12Gi" volumeMounts: - name: clickhouse-data mountPath: /var/lib/clickhouse # - name: clickhouse-log # mountPath: /var/log/clickhouse-server - name: clickhouse-backup image: altinity/clickhouse-backup:2.6.39 command: - /bin/bash - -c - "/bin/clickhouse-backup -c /opt/backup-config.yml server" volumeMounts: - name: clickhouse-data mountPath: /var/lib/clickhouse - name: clickhouse-backup-config mountPath: /opt/backup-config.yml subPath: backup-config.yml readOnly: true volumes: - name: clickhouse-backup-config secret: secretName: clickhouse-backup-config optional: false # nodeSelector: # kube/nodegroup: "general" volumeClaimTemplates: - name: clickhouse-data spec: # storageClassName: storage_classname accessModes: - ReadWriteOnce resources: requests: storage: 512Gi # - name: clickhouse-log # spec: # storageClassName: storage_classname # accessModes: # - ReadWriteOnce # resources: # requests: # storage: 100Gi ``` Other useful knobs in this file (edit if your environment differs): * `templates.podTemplates[clickhouse].spec.containers[clickhouse].resources` — default request and limit are both `cpu: 4`, `memory: 12Gi`. * `templates.volumeClaimTemplates[clickhouse-data].resources.requests.storage` — default `512Gi`. * `templates.podTemplates[clickhouse].spec.nodeSelector` — uncomment to pin to a specific nodegroup. {% hint style="info" %} highflame-clickhouse-backup-config.yml {% endhint %} ```bash general: remote_storage: gcs disable_progress_bar: false backups_to_keep_local: 1 backups_to_keep_remote: 10 log_level: info allow_empty_backups: false clickhouse: username: highflame_backup password: "" host: clickhouse-ch.clickhouse.svc.cluster.local port: 9000 disk_mapping: {} skip_tables: - system.* timeout: 5m freeze_by_part: false secure: false skip_verify: false sync_replicated_tables: true log_sql_queries: false gcs: sa_email: "SA_EMAIL" bucket: "GCS_BUCKET" path: highflame-clickhouse compression_level: 1 compression_format: tar storage_class: STANDARD ``` Renders into `Secret/clickhouse-backup-config` (mounted into the sidecar at `/opt/backup-config.yml`). Update: * `SA_EMAIL` — GCP service account email * `GCS_BUCKET` — GCS bucket name * `clickhouse.username` / `clickhouse.password` to match `CH_BACKUP_USERNAME` / `CH_BACKUP_PASSWORD` from the previous step * The default `path: highflame-clickhouse` is a key prefix inside the GCS bucket — change it per environment if you share a GCS bucket between environments. {% hint style="info" %} highflame-clickhouse-addons-values.yml {% endhint %} ```bash apiVersion: batch/v1 kind: CronJob metadata: name: clickhouse-backup namespace: clickhouse spec: concurrencyPolicy: Forbid successfulJobsHistoryLimit: 1 failedJobsHistoryLimit: 1 schedule: "0 */12 * * *" jobTemplate: spec: activeDeadlineSeconds: 14400 backoffLimit: 1 template: spec: serviceAccountName: clickhouse-sa restartPolicy: Never containers: - name: clickhouse-backup image: altinity/clickhouse-backup:2.6.39 envFrom: - configMapRef: name: clickhouse-cm command: - /bin/bash - -c - | set -e BACKUP_CFG="/opt/backup-config.yml" BACKUP_TIME="$(date +%F-%H-%M-%S)" if [[ "$${BACKUP_DB_LIST}" != "" ]] ; then IFS=',' read -r -a DBS <<< "$${BACKUP_DB_LIST}" for db in "$${DBS[@]}" ; do BACKUP_NAME="$${db}--$${BACKUP_TIME}" echo "Creating backup $${BACKUP_NAME}" /bin/clickhouse-backup -c $${BACKUP_CFG} create_remote --tables="$${db}.*" $${BACKUP_NAME} echo "Deleting all local backups" rm -rf /var/lib/clickhouse/backup/$${BACKUP_NAME} echo "Rotating old remote backups - keeping last $${BACKUP_KEEP}" BACKUPS=$$(clickhouse-backup -c $${BACKUP_CFG} list remote | grep -i "$${db}--" | awk '{print $$1}') BKP_COUNT=$$(echo "$${BACKUPS}" | wc -l) if [[ "$${BKP_COUNT}" -le "$${BACKUP_KEEP}" ]] ; then echo "Nothing to delete...($${BKP_COUNT} backups, keep $${BACKUP_KEEP})" else DELETE_COUNT=$$((BKP_COUNT - BACKUP_KEEP)) DELETE_LIST=$$(echo "$${BACKUPS}" | head -n $${DELETE_COUNT}) for bkp in $${DELETE_LIST} ; do echo "Deleting remote backup: $${bkp}" clickhouse-backup -c $${BACKUP_CFG} delete remote "$${bkp}" done fi echo "Backup job completed for database $${db}" done else echo "Backup is disabled. Current BACKUP_DB_LIST is empty" fi volumeMounts: - name: clickhouse-data mountPath: /var/lib/clickhouse - name: clickhouse-backup-config mountPath: /opt/backup-config.yml subPath: backup-config.yml readOnly: true volumes: - name: clickhouse-data persistentVolumeClaim: claimName: clickhouse-data - name: clickhouse-backup-config secret: secretName: clickhouse-backup-config optional: false affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: clickhouse.altinity.com/chi operator: In values: - ch topologyKey: "kubernetes.io/hostname" --- apiVersion: batch/v1 kind: Job metadata: name: clickhouse-setup namespace: clickhouse spec: completions: 1 parallelism: 1 backoffLimit: 0 template: spec: restartPolicy: Never initContainers: - name: wait image: curlimages/curl:8.17.0 command: - /bin/sh - -c - | echo "Waiting for ClickHouse to be ready..." until curl -s "http://$${CLICKHOUSE_HOST}:8123/ping" | grep -q "Ok" ; do echo "ClickHouse not ready yet. Waiting..." sleep 5 done echo "ClickHouse is ready!" envFrom: - configMapRef: name: clickhouse-cm containers: - name: setup image: clickhouse/clickhouse-server:25.10.2 command: - /bin/bash - -c - | set -e export LOG_TABLES=( "system.trace_log" "system.text_log" "system.processors_profile_log" "system.query_log" "system.metric_log" "system.part_log" "system.asynchronous_metric_log" ) if [[ "$${CREATE_DB_LIST}" != "" ]] ; then IFS=',' read -r -a DBS <<< "$${CREATE_DB_LIST}" for db in "$${DBS[@]}" ; do echo "Creating DB: $${db}" clickhouse-client \ --host="$${CLICKHOUSE_HOST}" \ --user="$${CLICKHOUSE_USER}" \ --password="$${CLICKHOUSE_PASSWORD}" \ --query="CREATE DATABASE IF NOT EXISTS $${db}" done else echo "Database creation is disabled. Current CREATE_DB_LIST is empty" fi echo "DB creation is completed...!" for log_tbl in $${LOG_TABLES[@]} ; do echo "Applying system log TTL to $${log_tbl}" db_name="$${log_tbl%%.*}" table_name="$${log_tbl##*.}" exists=$$(clickhouse-client \ --host="$${CLICKHOUSE_HOST}" \ --user="$${CLICKHOUSE_USER}" \ --password="$${CLICKHOUSE_PASSWORD}" \ --query="SELECT count() FROM $${db_name}.tables WHERE database='$${db_name}' AND name='$${table_name}'") if [ "$${exists}" -eq 1 ]; then clickhouse-client \ --host="$${CLICKHOUSE_HOST}" \ --user="$${CLICKHOUSE_USER}" \ --password="$${CLICKHOUSE_PASSWORD}" \ --query " SET alter_sync = 0; ALTER TABLE $${log_tbl} MODIFY TTL event_date + INTERVAL 3 DAY; " else echo "Table $${log_tbl} does not exist yet. Skipping." fi done for log_tbl in $${LOG_TABLES[@]} ; do echo "Validating the table size of the table $${log_tbl}" db_name="$${log_tbl%%.*}" table_name="$${log_tbl##*.}" clickhouse-client \ --host="$${CLICKHOUSE_HOST}" \ --user="$${CLICKHOUSE_USER}" \ --password="$${CLICKHOUSE_PASSWORD}" \ --query " SELECT name, engine_full FROM $${db_name}.tables WHERE database='$${db_name}' AND name IN ('$${table_name}'); " done envFrom: - configMapRef: name: clickhouse-cm env: - name: CLICKHOUSE_USER valueFrom: secretKeyRef: name: clickhouse-secrets key: CH_ADMIN_USERNAME - name: CLICKHOUSE_PASSWORD valueFrom: secretKeyRef: name: clickhouse-secrets key: CH_ADMIN_PASSWORD ``` Defines: * `CronJob/clickhouse-backup` — runs every 12h (`0 */12 * * *`), iterates `BACKUP_DB_LIST` from the ConfigMap, calls `clickhouse-backup create_remote` per DB, and prunes old remote backups beyond `BACKUP_KEEP`. * `Job/clickhouse-setup` — one-shot job that waits for ClickHouse to answer `/ping`, creates databases listed in `CREATE_DB_LIST`, and applies `MODIFY TTL event_date + INTERVAL 3 DAY` to every `system.*_log` table. #### Install — step by step Run from the repo root unless noted otherwise. Namespace ```bash kubectl create ns clickhouse ``` Operator ```bash helm repo add altinity https://helm.altinity.com helm repo update helm install ch-operator altinity/altinity-clickhouse-operator \ --namespace clickhouse \ --version 0.25.5 ``` Verify the operator pod is running, and the CRDs are installed: ```bash kubectl -n clickhouse get pods -l app=clickhouse-operator kubectl get crds | grep -E 'clickhouseinstallation|clickhousekeeper' ``` You should see * `clickhouseinstallations.clickhouse.altinity.com` * `clickhouseinstallationtemplates.clickhouse.altinity.com` * `clickhousekeeperinstallations.clickhouse-keeper.altinity.com` Dependencies (SA + ConfigMap + Secret) ```bash kubectl apply -f highflame-clickhouse-deps.yml ``` Keeper + ClickHouse cluster ```bash kubectl apply -f highflame-clickhouse-values.yml ``` Verify the deployment: ```bash kubectl -n clickhouse get pods ``` Backup secret ```bash kubectl apply -f highflame-clickhouse-addons-values.yml ``` The setup Job runs once and exits `Completed`. The CronJob fires every 12h #### Clickhouse GCS bucket backup management (From the backup containers) List down the remote backup ```bash kpe chi-ch-cluster-0-0-0 /bin/bash -n clickhouse -c clickhouse-backup /bin/clickhouse-backup -c /opt/backup-config.yml list remote ``` Restore from remote backup (For DR) ```bash /bin/clickhouse-backup -c /opt/backup-config.yml restore_remote BACKUP_NAME ``` ### Highflame service deployment Follow this documentation to deploy Highflame services to your Kubernetes cluster using Helm charts. Whether you're standing up a fresh environment or upgrading an existing one, this guide has you covered end to end. {% content-ref url="/pages/SOHIdzebu16rS55b9pZg" %} [Highflame Services](/deployment-guides/highflame-services.md) {% endcontent-ref %} --- # Agent Instructions: Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter: ``` GET https://docs.highflame.ai/deployment-guides/gcp.md?ask= ``` The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.