Skip to main content
Version: 4.5

Katonic MLOps Platform on GCP

This guide describes how to install, operate, administer, and configure the Katonic Platform in your own GCP Kubernetes cluster. This content applies to Katonic users with self-installation licenses.

Hardware Configurations​

This configuration is designed to offer high availability (HA) or performance testing. It is designed to achieve superior performance that enables real-time execution of analytics, machine learning (ML), and artificial intelligence (AI) applications in a production pipeline.

Katonic on GKE​

Katonic can run on a Kubernetes cluster provided by GCP Google Kubernetes Engine. When running on GKE, the Katonic architecture uses GCP resources to fulfil the Katonic MLOps platform requirements as follows:

Architecture1

  • Kubernetes control moves to the GKE control plane with managed Kubernetes masters

  • GCP GCS bucket is used to store entire platform backups.

  • The pd.csi.storage.gke.io provisioner is used to create persistent volumes for Katonic executions

  • Katonic cannot be installed on GCP GKE Autopilot.

  • Using GKE Node groups Katonic platform divides the compute and platform workloads on different set of machines.

Your annual Katonic license fee will not include any charges incurred from using GCP services. You can find detailed pricing information for the GCP services at Google Cloud Pricing Calculator

Setting up a GKE cluster for Katonic Platform​

This section describes how to configure a GCP GKE cluster for use with Katonic. When configuring a GKE cluster for Katonic, you must be familiar with the following GCP services:

  • Google Kubernetes Engine (GKE)

  • Identity and Access Management (IAM)

  • Virtual Private Cloud (VPC) Networking

  • Disks

  • GCP Filestore

  • Google Cloud Storage(GCS)

Additionally, a basic understanding of Kubernetes concepts like node pools, network CNI, storage classes, autoscaling, and Docker will be useful when deploying the cluster.

Service Account and Permissions​

  • When creating a GKE cluster, it is imperative to ensure that the default service account is present and has the correct role assigned.

  • The absence or misconfiguration of these service account can lead to failures in cluster creation and operation.

  • This service account is automatically created and managed by Google Cloud when you enable the Kubernetes Engine API.

  • The service account follows the following format: serviceAccount:PROJECT_NUMBER-compute@developer.gserviceaccount.com

  • This service account should have the role: Kubernetes Engine Service Agent

  • During the GKE cluster installation process, create a dedicated service account for the node pools.

  • Attach the "Kubernetes Engine Node Service Agent" role to the service account. This role provides the minimal set of permissions required by a GKE node to support standard capabilities, including logging and monitoring export, as well as image pulls.

  • Ensure to note down the identifier (ID) of the created service account.

  • Incorporate the service account's ID into the configuration file katonic.yml as part of the setup process.

IAM Permissions for User​

In order to complete the installation, the IAM user must have the following GCP permissions. These permissions include both GCP Managed Roles and Custom Managed Roles that need to be created and attached to the IAM user.

GCP Managed Roles:​

  • Compute Instance Admin (v1)
  • Editor
  • Kubernetes Engine Admin
  • Project IAM Admin
  • Storage Object Admin
  • Role Administrator

Service quotas​

GCP maintains default service quotas for each of the services listed previously. You can check the default service quotas and manage your quotas by logging in to the GCP Quotas.

Create Google Kubernetes Engine (GKE)​

By default Katonic installer create GKE cluster. If you are going to create GKE cluster then first create new separate VPC with 1 subnets and 2 zones and create GKE cluster in that VPC.

Dynamic block storage​

The GKE cluster must be equipped with a Volume-backed storage class that Katonic will use to provision ephemeral volumes for user execution. Katonic installer create this storage class by default. If you are going to create cluster you need to create kfs storage class. Use the following for an example storage class specification YAML to create:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
components.gke.io/layer: addon
labels:
addonmanager.kubernetes.io/mode: EnsureExists
k8s-app: gcp-compute-persistent-disk-csi-driver
name: kfs
parameters:
type: pd-balanced
provisioner: pd.csi.storage.gke.io
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

Dynamic shared storage​

The GCP Filestore service must be provisioned and an access point must be configured to allow access from the GKE cluster. Katonic Installer has an optional parameter shared_storage.create to create GCP Filestore based storage class. If you are going to create cluster then you can create dynamic shared storage class by yourself using the following YAML:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
components.gke.io/component-name: filestorecsi
components.gke.io/component-version: 0.4.30
components.gke.io/layer: addon
labels:
addonmanager.kubernetes.io/mode: EnsureExists
k8s-app: gcp-filestore-csi-driver
name: kfs-shared
parameters:
tier: premium
provisioner: filestore.csi.storage.gke.io
reclaimPolicy: Delete
volumeBindingMode: Immediate
allowVolumeExpansion: true

GCS Bucket​

This is used for taking backup of GKE cluster on GCS bucket. Katonic Installer has an optional parameter backup_enabled to create a GCS bucket and take backup. By default, backup is scheduled every 24hr and backup expires after 30 days. You can configure this setting in the katonic.yml template file.

GCP GKE Cluster Autoscaler​

Katonic Installer has an optional parameter autoscaler.enabled to enable cluster autoscaler. If you are going to create GKE cluster then enable the GKE Cluster autoscaler and set Location policy to β€œBalanced" and use size limit type to "Total limits"

Architecture2

Note: If Cluster is deployed in 2 zones then specify the node locations.

Domain​

Katonic must be configured to serve from a specific FQDN. To serve Katonic securely over HTTPS, you will also need an SSL certificate that covers the chosen name. Record the FQDN for use when installing Katonic.

Katonic offers the default option to use the .katonic.ai domain in all versions of the Katonic Platform. However, if you have your own domain, you can also utilize it across all versions provided by the Katonic Platform.

Resources Provisioned Post-Installation​

When the platform is installed, it creates the following resources. Take this into account when selecting your installation configuration.

Sr no.TypeAmountWhenNotes
1Load Balancer1AlwaysOnly 1 is required. Automatically gets created by GKE when required.
2Network interface1 per nodeAlways
3OS boot disk1 per nodeAlways
4Public IP address1 per nodeThe platform has public IP addresses.
5VPC1The platform is deployed to a new VPC.
6Filestore1For shared storage
7GKE Cluster1GKE is used as the application clusterVersion 1.28.3-gke.1118000

Kubernetes(GKE) version​

Katonic MLOps platform 4.5 version has been validated with Kubernetes(GKE) version 1.28 and above.

GCP GKE High Availability​

When deploying a GKE (Google Kubernetes Engine) cluster on Google Cloud Platform, you have the flexibility to choose the level of high availability (HA) that suits your needs. The configuration you select will determine the number of nodes provisioned for your GKE cluster. By utilizing the min_count flag during the cluster setup, you can specify the desired number of nodes per zone.

  • High Availability (With HA) Deployment Option: When you select the High Availability (With HA) deployment option, your GKE cluster will be set up with two zones. GKE will distribute the nodes across these zones within your chosen region. This configuration ensures exceptional availability and fault tolerance for your cluster by guaranteeing that each zone contains at least one node.

    GKE will create a total of four nodes, with two nodes allocated to each of the two zones, if you set the minimum count to two.

  • Without High Availability (Without HA) Deployment Option: In the Without High Availability (Without HA) deployment option, your GKE cluster will utilize a single zone. All the nodes will be created within this zone.

    For instance, if you set the minimum count to two in a Without High Availability deployment, GKE will create a total of two nodes within the same zone.

By offering both High Availability and Without High Availability deployment options, GKE enables you to choose the level of fault tolerance and complexity that aligns with your specific requirements.

Data Visualisation​

  • Katonic MLOps platform 4.5 include Superset Version 2.0.1 for Data Visualization.

  • You require an additional DNS if you're installing Superset.

    Example:

Connectors​

  • Katonic MLOps platform 4.5 include Airbyte Version 0.40.32 for Connectors.

  • You require an additional DNS if you're installing Airbyte.

    Example:

Katonic Platform Installation​

Installation of the Katonic platform has been segmented based on product. When you click the link, you will be redirected to the installation process documentation.