Katonic Companion
Node pool requirementsโ
The GKE cluster can be configured as either a single-node cluster or a multi-node cluster, depending on the user's needs:
A. Single-Node GKE Cluster:โ
The GKE cluster must have one node pool that produce worker nodes with the following specifications and distinct node labels.
SR NO. | POOL | MIN-MAX | INSTANCE | LABELS | TAINTS |
---|---|---|---|---|---|
1 | Compute | 1-10 | c2-standard-8 | katonic.ai/node-pool=compute | |
2 | Vectordb | 1-4 | c2-standard-4 | katonic.ai/node-pool=vectordb | katonic.ai/node-pool=vectordb:NoSchedule |
B. Multi-Node GKE Cluster:โ
The GKE cluster must have at least three node pools with the following specifications and distinct node labels:
SR NO. | POOL | MIN-MAX | VM | LABELS | TAINTS |
---|---|---|---|---|---|
1 | Platform | 1-4 (With HA) 2-4 (Without HA) | c2-standard-4 | katonic.ai/node-pool=platform | katonic.ai /node-pool=platform:NoSchedule |
2 | Compute | 1-10 | c2-standard-8 | katonic.ai/node-pool=compute | |
3 | Deployment | 1-10 | c2-standard-8 | katonic.ai/node-pool=deployment | katonic.ai/node-pool=deployment:NoSchedule |
4 | Vectordb | 1-4 | c2-standard-4 | katonic.ai/node-pool=vectordb | katonic.ai/node-pool=vectordb:NoSchedule |
5 | GPU (Optional) | 0-5 | Required VM type | katonic.ai/node-pool=gpu-{GPU-type} | nvidia.com/gpu=present:NoSchedule |
Note: For example we can use GPU type as v100, A30, A100
Note: When backup_enabled = True, then compute_nodes.min_count should be set to 2.
GCP Platform-Node Specificationsโ
Platform nodes in platform GCP cloud deployments must fulfil the following hardware specification requirements according to the deployment type:
Component | Specification |
---|---|
Node count | min 2 |
Instance type | c2-standard-4 |
vCPUs | 4 |
Memory | 16 GB |
Boot disk size | 128 GB |
GCP Compute-Node Specificationsโ
Instance types that must be used by compute nodes in GCP cloud installations on the Katonic platform include:
Choose the type that best fits your requirements. GCP GKE (Google Kubernetes Engine) is also supported for application nodes, using the instance types listed below. For specification details for each type, refer to the GCP documentation.
Note: Supported compute node configurations
- c2-standard-8
- c2-standard-16
- c2-standard-32
- Boot Disk: Min 128GB
GCP Deployment-Node Specificationsโ
Instance types that must be used by deployment nodes in GCP cloud installations on the Katonic platform include:
Choose the type that best fits your requirements. GCP GKE (Google Kubernetes Engine) is also supported for application nodes, using the instance types listed below. The Katonic platform requires at least 1 minimum deployment node for the community version. For specification details for each type, refer to the GCP documentation.
Note: Supported deployment node configurations
- c2-standard-8
- c2-standard-16
- c2-standard-32
- Boot Disk: Min 128GB
GCP Vectordb-Node Specificationsโ
Vectordb nodes in GCP cloud deployments must fulfil the following hardware specification requirements:
COMPONENT | SPECIFICATION |
---|---|
Instance type | c2-standard-4 |
GCP GPU-Node Specificationsโ
As of now, the GPU node pool is supported by Katonic-installer version 5.0.
Choose the instance type that best fits your requirements. Google Kubernetes Engine (GKE) is also supported for application nodes in the GKS (Google Kubernetes Service) platform, utilizing the instance types provided by Google Cloud. For specification details for each type, refer to the GCP documentation.
Note: Supported gpu node configurations
- Boot disk size = Min 512GB
- Label = katonic.ai/node-pool=gpu-{gpu-type}
- Taints = nvidia.com/gpu=present:NoSchedule
Note: For example we can use GPU type as v100, A30, A100
Additional node pools with distinct katonic.ai/node-pool labels can be added to make other instance types available for Katonic executions.
Katonic Platform Installationโ
General completion time: 45 minute
Installation processโ
The Katonic platform runs on Kubernetes. To simplify the deployment and configuration of Katonic services, Katonic provides an install automation tool called the Katonic-installer that will deploy Katonic into your compatible cluster. The Katonic-installer is an ansible role delivered in a Docker container and can be run locally.
Prerequisitesโ
To install and configure Katonic in your GCP account you must have:
quay.io credentials from Katonic.
GCP with enough quota to create:
- At least 2 c2-standard-4 machines for platform nodes and at least 1 c2-standard types EC2 machine for compute nodes
A Linux operating system (Ubuntu/Debian) based machine with the following Steps:
a. A Linux operating system (Ubuntu/Debian) based machine needs 4GB RAM and 2vcpus and The boot disk size should be 50GB.
b. While creating VM select the service account (Katonic) in the Identity and API access section. Skip to step c if you already have the machine with the given specifications.
Note: After the platform is deployed successfully, the VM can be deleted.
c. Switch to the root user inside the machine.
d. gcloud CLI must be installed and logged in to your GCP project and service account using the gcloud init command.
Commands for installing gcloud CLI:
apt-get install snapd -y
snap install google-cloud-cli --classicCommands to login using gcloud CLI:
gcloud init
gcloud auth application-default login
To install Katonic Platform Companion version follow the steps mentioned below:โ
1. Log in to Quay with the credentials described in the requirements section above.โ
docker login quay.io
2. Retrieve the Katonic installer image from Quay.โ
docker pull quay.io/katonic/katonic-installer:v5.0.9
3. Create a directory.โ
mkdir katonic
cd katonic
4. Adding PEM Encoded Public Key Certificate and Private Key to Directoryโ
Put the PEM encoded public key certificate (having extension .crt) for your domain and private key associated with the given certificate (having extension .key) inside the current directory (katonic).
5. The Katonic Installer can deploy the Katonic Platform Companion version in two ways:โ
- Creating GKE and deploying the Katonic Platform Companion version.
- Install Katonic Platform Companion version on existing GKE.
1. Creating GKE and deploying the Katonic Platform Companion versionโ
A. Single-Node GKE Clusterโ
Initialize the installer application to generate a template configuration file named katonic.yml.
docker run -it --rm --name generating-yaml -v $(pwd):/install quay.io/katonic/katonic-installer:v5.0.9 init gcp katonic_companion single_node deploy_kubernetes public
Edit the configuration file with all necessary details about the target cluster, storage systems, and hosting domain. Read the following configuration reference
PARAMETER | DESCRIPTION | VALUE |
---|---|---|
katonic_platform_version | It has the value by default regarding the Katonic Platform Version. | katonic_companion |
deploy_on | Cluster to be deployed on | GCP |
create_k8s_cluster | Must be set to True | True |
single_node_cluster | set "True" if opting for single node cluster | True or False |
gke_k8s_version | GKE version | eg. 1.29(1.27 and above versions supported) |
cluster_name | Cluster name to be | eg. katonic-companion-platform-v5-0 |
gcp_region | GCP region name | eg. us-east1 |
gcp_project_id | Set your GCP project ID | eg. ardent-timm-1000678 |
service_account_id | Set created service account email ID | eg. katonic-main@ardent-timm-1000678.iam.gserviceaccount.com |
zone_1 | eg. us-east1-b | |
zone_2 | eg. us-east2-c | |
compute_nodes.instance_type | Compute node VM size | eg. c2-standard-8 |
compute_nodes.min_count | Minimum number of compute nodes should not be less than 1 | eg. 1 |
compute_nodes.max_count | Maximum number of compute should be greater than compute nodes min count nodes. | eg. 3 |
compute_nodes.os_disk_size | Compute Nodes OS Disk Size | eg. 128 GB |
vectordb_nodes.instance_type | Vectordb Node VM size | eg. c2-standard-4 |
vectordb_nodes.min_count | Minimum number of Vectordb nodes should be 1 | eg. 1 |
vectordb_nodes.max_count | Maximum number of Vectordb should be greater than Deployment nodes min count nodes. | eg. 4 |
vectordb_nodes.os_disk_size | Vectordb Nodes OS Disk Size | eg. 128 GB |
gpu_enabled | add GPU nodepool | True or False |
gpu_nodes.instance_type | GPU node VM size | eg n1-standard-1 |
gpu_nodes.gpu_machine_type | Type of machine you need | eg nvidia-tesla-p4 |
gpu_nodes.gpu_type | Enter the GPU type available on the machine | eg. v100,k80 |
gpu_nodes.gpu_count | eg 2 | |
gpu_nodes.min_count | Minimum number of GPU nodes | eg. 1 |
gpu_nodes.max_count | Maximum number of GPU nodes | eg. 2 |
gpu_nodes.os_disk_size | Enter GPU nodes OS disk size | eg 512 GB |
gpu_nodes.gpu_vRAM | Enter GPU node RAM size | |
gpu_nodes.gpus_per_node | Enter the number of GPUs per node | |
enable_gpu_workspace | Set it true if you want to use GPU Workspace | True or False |
shared_storage_create | Set it True if you want to have shared storage | True or False |
genai_nfs_size | nfs storage size for genai | 100Gi |
workspace_timeout_interval | Set the Timeout Interval in Hours | eg. "12" |
backup_enabled | Enable backup | True or False |
backup_schedule | Backup schedule | 0 0 1 * * |
backup_expiration | Backup expiration | 2160h0m0s |
use_custom_domain | Set this to True if you want to host the Katonic platform on your custom domain. Skip if use_katonic_domain: True | True or False |
custom_domain_name | Expected a valid domain | eg. katonic.tesla.com |
use_katonic_domain | Set this to True if you want to host the Katonic platform on the Katonic Deploy Platform domain. Skip if use_custom_domain: True | True or False |
katonic_domain_prefix | One word expected with no special characters and all small alphabets | eg. tesla |
enable_pre_checks | Set this to True if you want to perform the Pre-checks | True / False |
AD_Group_Management | Set "True" to enable functionality that provides you ability to sign in using Azure AD | False |
AD_CLIENT_ID | Client ID of App registered for SSO in client's Azure or Identity Provider | |
AD_CLIENT_SECRET | Client Secret of App registered for SSO in client's Azure or any other Identity Provider | |
AD_AUTH_URL | Authorization URL endpoint of app registered for SSO. | |
AD_TOKEN_URL | Token URL endpoint of app registered for SSO. | |
quay_username | Username for quay registry | |
quay_password | Password for quay registry | |
adminUsername | Email for admin user | eg. john@katonic.ai |
adminPassword | Password for admin user | At least 1 special character, at least 1 uppercase letter, at least 1 lowercase letter, minimum 8 characters |
adminFirstName | Admin first name | eg. john |
adminLastName | Admin last name | eg. musk |
B. Multi-Node GKE Clusterโ
Initialize the installer application to generate a template configuration file named katonic.yml.
docker run -it --rm --name generating-yaml -v $(pwd):/install quay.io/katonic/katonic-installer:v5.0.9 init gcp katonic_companion multi_node deploy_kubernetes public
Edit the configuration file with all necessary details about the target cluster, storage systems, and hosting domain. Read the following configuration reference
PARAMETER | DESCRIPTION | VALUE |
---|---|---|
katonic_platform_version | It has the value by default regarding the Katonic Platform Version. | katonic_companion |
deploy_on | Cluster to be deployed on | GCP |
create_k8s_cluster | Must be set to True | True |
gke_k8s_version | GKE version | eg. 1.29(1.27 and above versions supported) |
cluster_name | Cluster name to be | eg. katonic-companion-platform-v5-0 |
gcp_region | GCP region name | eg. us-east1 |
gcp_project_id | Set your GCP project ID | eg. ardent-timm-1000678 |
service_account_id | Set created service account email ID | eg. katonic-main@ardent-timm-1000678.iam.gserviceaccount.com |
zone_1 | eg. us-east1-b | |
zone_2 | eg. us-east2-c | |
platform_nodes.instance_type | Platform node VM size | eg. c2-standard-4 |
platform_nodes.min_count | Minimum number of platform nodes should be 2 | eg. 2 |
platform_nodes.max_count | Maximum number of platform should be greater than platform nodes min count | eg. 3 |
platform_nodes.os_disk_size | Platform Nodes OS Disk Size | eg. 128 GB |
compute_nodes.instance_type | Compute node VM size | eg. c2-standard-8 |
compute_nodes.min_count | Minimum number of compute nodes should not be less than 1 | eg. 1 |
compute_nodes.max_count | Maximum number of compute should be greater than compute nodes min count nodes. | eg. 3 |
compute_nodes.os_disk_size | Compute Nodes OS Disk Size | eg. 128 GB |
vectordb_nodes.instance_type | Vectordb Node VM size | eg. c2-standard-4 |
vectordb_nodes.min_count | Minimum number of Vectordb nodes should be 1 | eg. 1 |
vectordb_nodes.max_count | Maximum number of Vectordb should be greater than Deployment nodes min count nodes. | eg. 4 |
vectordb_nodes.os_disk_size | Vectordb Nodes OS Disk Size | eg. 128 GB |
deployment_nodes.instance_type | Deployment Node VM size | eg. c2-standard-8 |
deployment_nodes.min_count | Minimum number of Deployment nodes should be 1 | eg. 1 |
deployment_nodes.max_count | Maximum number of Deployment should be greater than Deployment nodes min count nodes. | eg. 4 |
deployment_nodes.os_disk_size | Deployment Nodes OS Disk Size | eg. 128 GB |
gpu_enabled | add GPU nodepool | True or False |
gpu_nodes.instance_type | GPU node VM size | eg n1-standard-1 |
gpu_nodes.gpu_machine_type | Type of machine you need | eg nvidia-tesla-p4 |
gpu_nodes.gpu_type | Enter the GPU type available on the machine | eg. v100,k80 |
gpu_nodes.gpu_count | eg 2 | |
gpu_nodes.min_count | Minimum number of GPU nodes | eg. 1 |
gpu_nodes.max_count | Maximum number of GPU nodes | eg. 2 |
gpu_nodes.os_disk_size | Enter GPU nodes OS disk size | eg 512 GB |
gpu_nodes.gpu_vRAM | Enter GPU node RAM size | |
gpu_nodes.gpus_per_node | Enter the number of GPUs per node | |
enable_gpu_workspace | Set it true if you want to use GPU Workspace | True or False |
shared_storage_create | Set it True if you want to have shared storage | True or False |
genai_nfs_size | nfs storage size for genai | 100Gi |
workspace_timeout_interval | Set the Timeout Interval in Hours | eg. "12" |
backup_enabled | Enable backup | True or False |
backup_schedule | Backup schedule | 0 0 1 * * |
backup_expiration | Backup expiration | 2160h0m0s |
use_custom_domain | Set this to True if you want to host the Katonic platform on your custom domain. Skip if use_katonic_domain: True | True or False |
custom_domain_name | Expected a valid domain | eg. katonic.tesla.com |
use_katonic_domain | Set this to True if you want to host the Katonic platform on the Katonic Deploy Platform domain. Skip if use_custom_domain: True | True or False |
katonic_domain_prefix | One word expected with no special characters and all small alphabets | eg. tesla |
enable_pre_checks | Set this to True if you want to perform the Pre-checks | True / False |
AD_Group_Management | Set "True" to enable functionality that provides you ability to sign in using Azure AD | False |
AD_CLIENT_ID | Client ID of App registered for SSO in client's Azure or Identity Provider | |
AD_CLIENT_SECRET | Client Secret of App registered for SSO in client's Azure or any other Identity Provider | |
AD_AUTH_URL | Authorization URL endpoint of app registered for SSO. | |
AD_TOKEN_URL | Token URL endpoint of app registered for SSO. | |
quay_username | Username for quay registry | |
quay_password | Password for quay registry | |
adminUsername | Email for admin user | eg. john@katonic.ai |
adminPassword | Password for admin user | At least 1 special character, at least 1 uppercase letter, at least 1 lowercase letter, minimum 8 characters |
adminFirstName | Admin first name | eg. john |
adminLastName | Admin last name | eg. musk |
Installing the Katonic Platform Companion versionโ
After configuring the katonic.yml file, run the following command to install the Katonic Platform Companion version:
docker run -it --rm --name install-katonic -v /root/.config:/root/.config -v $(pwd):/inventory quay.io/katonic/katonic-installer:v5.0.9
2. Deploying Katonic Platform Companion version on existing GKEโ
The steps are similar to Installing the Katonic Platform with GCP Google Kubernetes Engine. Just edit the configuration file with all the details about the target cluster, storage systems, and hosting domain. Read the following configuration reference, these are the only parameters required when installing the Katonic MLOps platform on existing GKE.
Prerequisites
You will need to create a kfs named storage class. Please refer to the main documentation of GCP โ Dynamic Block Storage for instructions on how to create the storage class.
A. Single-Node GKE Clusterโ
Initialize the installer application to generate a template configuration file named katonic.yml.
docker run -it --rm --name generating-yaml -v $(pwd):/install quay.io/katonic/katonic-installer:v5.0.9 init gcp katonic_companion single_node kubernetes_already_exists public
B. Multi-Node GKE Clusterโ
Initialize the installer application to generate a template configuration file named katonic.yml.
docker run -it --rm --name generating-yaml -v $(pwd):/install quay.io/katonic/katonic-installer:v5.0.9 init gcp katonic_companion multi_node kubernetes_already_exists public
For both single node and multi-node clusters, the configuration template includes the following parameters:
PARAMETER | DESCRIPTION | VALUE |
---|---|---|
katonic_platform_version | It has the value by default regarding the Katonic Platform Version. | katonic_companion |
deploy_on | Cluster to be deployed on | GCP |
single_node_cluster | set "True" if opting for single node cluster | True or False |
cluster_name | Enter cluster name that you deploy | eg katonic-companion-platform-v5-0 |
gcp_region | GCP region name | eg. us-east1 |
gcp_project_id | Set your GCP project ID | eg. ardent-timm-1000678 |
genai_nfs_size | nfs storage size for genai | 100Gi |
workspace_timeout_interval | Set the Timeout Interval in Hours | eg. "12" |
use_custom_domain | Set this to True if you want to host katonic platform on your custom domain. Skip if use_katonic_domain: True | True or False |
custom_domain_name | Expected a valid domain. | eg. katonic.tesla.com |
use_katonic_domain | Set this to True if you want to host katonic platform on Katonic Deploy Platform domain. Skip if use_custom_domain: True | True or False |
katonic_domain_prefix | One word expected with no special characters and all small alphabets | eg. tesla |
enable_pre_checks | Set this to True if you want to perform the Pre-checks | True / False |
AD_Group_Management | Set "True" to enable functionality that provides you ability to sign in using Azure AD | False |
AD_CLIENT_ID | Client ID of App registered for SSO in client's Azure or Identity Provider | |
AD_CLIENT_SECRET | Client Secret of App registered for SSO in client's Azure or any other Identity Provider | |
AD_AUTH_URL | Authorization URL endpoint of app registered for SSO. | |
AD_TOKEN_URL | Token URL endpoint of app registered for SSO. | |
quay_username | Username for quay registry | |
quay_password | Password for quay registry | |
adminUsername | email for admin user | eg. john@katonic.ai |
adminPassword | password for admin user | at least 1 special character at least 1 upper case letter at least 1 lower case letter minimum 8 characters |
adminFirstName | Admin first name | eg. john |
adminLastName | Admin last name | eg. musk |
Note: In the katonic.yml template, the single_node_cluster parameter will be set to True for single node clusters and will be omitted for multi-node clusters
Installing Katonic Platform Companion versionโ
After configuring the katonic.yml file, run the following command to install the Katonic Platform Companion version:
docker run -it --rm --name install-katonic -v /root/.config:/root/.config -v $(pwd):/inventory quay.io/katonic/katonic-installer:v5.0.9
Installation Verificationโ
The installation process can take up to one hour to complete fully. The installer will output verbose logs, and commands to take kubectl access of deployed cluster and surface any errors it encounters. After installation, you can use the following commands to check whether all applications are running or not.
kubectl get pods --all-namespace
This will show the status of all pods being created by the installation process. If you see any pods enter a crash loop or hang in a non-ready state, you can get logs from that pod by running:
kubectl logs $POD_NAME --namespace $NAMESPACE_NAME
If the installation completes successfully, you should see a message that says:
TASK [platform-deployment : Credentials to access Katonic Companion Platform] *******************************ok: [localhost] => {
"msg": [
"Platform Domain: $domain_name",
"Username: $adminUsername",
"Password: $adminPassword"
]
}
However, the application will only be accessible via HTTPS at that FQDN if you have configured DNS for the name to point to an ingress load balancer with the appropriate SSL certificate that forwards traffic to your platform nodes.
Test and troubleshootโ
To verify the successful installation of Katonic, perform the following tests:
If you encounter a 500 or 502 error, take access of your cluster and execute the following command:
kubectl rollout restart deploy nodelog-deploy -n application
Login to the Katonic application and ensure that all the navigation panel options are operational. If this test fails, please verify that Keycloak was set up properly.
Create a new project and launch a Jupyter/JupyterLab workspace. If this test fails, please check that the default environment images have been loaded in the cluster.
Publish an app with Flask or Shiny. If this test fails, please verify that the environment images have Flask and Shiny installed.
Deleting the Katonic platform from GCPโ
When you start the installation, in your current directory, you will get the platform deletion script. you just need to run the script.
./gcp-cluster-delete.sh