Katonic MLOps Platform on GCP
This guide describes how to install, operate, administer, and configure the Katonic Platform in your own GCP Kubernetes cluster. This content applies to Katonic users with self-installation licenses.
Hardware Configurationsβ
This configuration is designed to offer high availability (HA) or performance testing. It is designed to achieve superior performance that enables real-time execution of analytics, machine learning (ML), and artificial intelligence (AI) applications in a production pipeline.
Katonic on GKEβ
Katonic can run on a Kubernetes cluster provided by GCP Google Kubernetes Engine. When running on GKE, the Katonic architecture uses GCP resources to fulfil the Katonic MLOps platform requirements as follows:
Kubernetes control moves to the GKE control plane with managed Kubernetes masters
GCP GCS bucket is used to store entire platform backups.
The pd.csi.storage.gke.io provisioner is used to create persistent volumes for Katonic executions
Katonic cannot be installed on GCP GKE Autopilot.
Using GKE Node groups Katonic platform divides the compute and platform workloads on different set of machines.
Your annual Katonic license fee will not include any charges incurred from using GCP services. You can find detailed pricing information for the GCP services at Google Cloud Pricing Calculator
Setting up a GKE cluster for Katonic Platformβ
This section describes how to configure a GCP GKE cluster for use with Katonic. When configuring a GKE cluster for Katonic, you must be familiar with the following GCP services:
Google Kubernetes Engine (GKE)
Identity and Access Management (IAM)
Virtual Private Cloud (VPC) Networking
Disks
GCP Filestore
Google Cloud Storage(GCS)
Additionally, a basic understanding of Kubernetes concepts like node pools, network CNI, storage classes, autoscaling, and Docker will be useful when deploying the cluster.
Service Account and Permissionsβ
When creating a GKE cluster, it is imperative to ensure that the default service account is present and has the correct role assigned.
The absence or misconfiguration of these service account can lead to failures in cluster creation and operation.
This service account is automatically created and managed by Google Cloud when you enable the Kubernetes Engine API.
The service account follows the following format: serviceAccount:PROJECT_NUMBER-compute@developer.gserviceaccount.com
This service account should have the role: Kubernetes Engine Service Agent
During the GKE cluster installation process, create a dedicated service account for the node pools.
Attach the "Kubernetes Engine Node Service Agent" role to the service account. This role provides the minimal set of permissions required by a GKE node to support standard capabilities, including logging and monitoring export, as well as image pulls.
Ensure to note down the identifier (ID) of the created service account.
Incorporate the service account's ID into the configuration file katonic.yml as part of the setup process.
IAM Permissions for Userβ
In order to complete the installation, the IAM user must have the following GCP permissions. These permissions include both GCP Managed Roles and Custom Managed Roles that need to be created and attached to the IAM user.
GCP Managed Roles:β
- Compute Instance Admin (v1)
- Editor
- Kubernetes Engine Admin
- Project IAM Admin
- Storage Object Admin
- Role Administrator
Service quotasβ
GCP maintains default service quotas for each of the services listed previously. You can check the default service quotas and manage your quotas by logging in to the GCP Quotas.
Create Google Kubernetes Engine (GKE)β
By default Katonic installer create GKE cluster. If you are going to create GKE cluster then first create new separate VPC with 1 subnets and 2 zones and create GKE cluster in that VPC.
Dynamic block storageβ
The GKE cluster must be equipped with a Volume-backed storage class that Katonic will use to provision ephemeral volumes for user execution. Katonic installer create this storage class by default. If you are going to create cluster you need to create kfs storage class. Use the following for an example storage class specification YAML to create:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
components.gke.io/layer: addon
labels:
addonmanager.kubernetes.io/mode: EnsureExists
k8s-app: gcp-compute-persistent-disk-csi-driver
name: kfs
parameters:
type: pd-balanced
provisioner: pd.csi.storage.gke.io
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
Dynamic shared storageβ
The GCP Filestore service must be provisioned and an access point must be configured to allow access from the GKE cluster. Katonic Installer has an optional parameter shared_storage.create to create GCP Filestore based storage class. If you are going to create cluster then you can create dynamic shared storage class by yourself using the following YAML:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
components.gke.io/component-name: filestorecsi
components.gke.io/component-version: 0.4.30
components.gke.io/layer: addon
labels:
addonmanager.kubernetes.io/mode: EnsureExists
k8s-app: gcp-filestore-csi-driver
name: kfs-shared
parameters:
tier: premium
provisioner: filestore.csi.storage.gke.io
reclaimPolicy: Delete
volumeBindingMode: Immediate
allowVolumeExpansion: true
GCS Bucketβ
This is used for taking backup of GKE cluster on GCS bucket. Katonic Installer has an optional parameter backup_enabled to create a GCS bucket and take backup. By default, backup is scheduled every 24hr and backup expires after 30 days. You can configure this setting in the katonic.yml template file.
GCP GKE Cluster Autoscalerβ
Katonic Installer has an optional parameter autoscaler.enabled to enable cluster autoscaler. If you are going to create GKE cluster then enable the GKE Cluster autoscaler and set Location policy to βBalanced" and use size limit type to "Total limits"
Note: If Cluster is deployed in 2 zones then specify the node locations.
Domainβ
Katonic must be configured to serve from a specific FQDN. To serve Katonic securely over HTTPS, you will also need an SSL certificate that covers the chosen name. Record the FQDN for use when installing Katonic.
Katonic offers the default option to use the .katonic.ai domain in all versions of the Katonic Platform. However, if you have your own domain, you can also utilize it across all versions provided by the Katonic Platform.
Resources Provisioned Post-Installationβ
When the platform is installed, it creates the following resources. Take this into account when selecting your installation configuration.
Sr no. | Type | Amount | When | Notes |
---|---|---|---|---|
1 | Load Balancer | 1 | Always | Only 1 is required. Automatically gets created by GKE when required. |
2 | Network interface | 1 per node | Always | |
3 | OS boot disk | 1 per node | Always | |
4 | Public IP address | 1 per node | The platform has public IP addresses. | |
5 | VPC | 1 | The platform is deployed to a new VPC. | |
6 | Filestore | 1 | For shared storage | |
7 | GKE Cluster | 1 | GKE is used as the application cluster | Version 1.29.3-gke.1118000 |
Kubernetes(GKE) versionβ
Katonic MLOps platform 5.0.9 version has been validated with Kubernetes(GKE) version 1.29 and above.
GCP GKE High Availabilityβ
When deploying a GKE (Google Kubernetes Engine) cluster on Google Cloud Platform, you have the flexibility to choose the level of high availability (HA) that suits your needs. The configuration you select will determine the number of nodes provisioned for your GKE cluster. By utilizing the min_count flag during the cluster setup, you can specify the desired number of nodes per zone.
High Availability (With HA) Deployment Option: When you select the High Availability (With HA) deployment option, your GKE cluster will be set up with two zones. GKE will distribute the nodes across these zones within your chosen region. This configuration ensures exceptional availability and fault tolerance for your cluster by guaranteeing that each zone contains at least one node.
GKE will create a total of four nodes, with two nodes allocated to each of the two zones, if you set the minimum count to two.
Without High Availability (Without HA) Deployment Option: In the Without High Availability (Without HA) deployment option, your GKE cluster will utilize a single zone. All the nodes will be created within this zone.
For instance, if you set the minimum count to two in a Without High Availability deployment, GKE will create a total of two nodes within the same zone.
By offering both High Availability and Without High Availability deployment options, GKE enables you to choose the level of fault tolerance and complexity that aligns with your specific requirements.
Data Visualisationβ
Katonic MLOps platform 5.0.9 include Superset Version 2.0.1 for Data Visualization.
You require an additional DNS if you're installing Superset.
Example:
- If your domain name to access platform is katonic.tesla.com.
- Then, the domain for data visualisation would look like dash-katonic.tesla.com.
Connectorsβ
Katonic MLOps platform 5.0.9 include Airbyte Version 0.40.32 for Connectors.
You require an additional DNS if you're installing Airbyte.
Example:
- If your domain name to access platform is katonic.tesla.com.
- Then, the domain for connectors would look like connectors-katonic.tesla.com.
Katonic Platform Installationβ
Installation of the Katonic platform has been segmented based on product. When you click the link, you will be redirected to the installation process documentation.