Skip to main content

Katonic Ace

Node pool requirementsโ€‹

The GKE cluster can be configured as either a single-node cluster or a multi-node cluster, depending on the user's needs:

A. Single-Node GKE Cluster:โ€‹

The GKE cluster must have one node pool that produce worker nodes with the following specifications and distinct node labels.

SR NO.POOLMIN-MAXINSTANCELABELSTAINTS
1Compute1-10c2-standard-8katonic.ai/node-pool=compute

B. Multi-Node GKE Cluster:โ€‹

The GKE cluster must have at least four node pools with the following specifications and distinct node labels:

SR NO.POOLMIN-MAXVMLABELSTAINTS
1Platform1-4 (With HA) 2-4 (Without HA)c2-standard-4katonic.ai/node-pool=platformkatonic.ai /node-pool=platform:NoSchedule
2Compute1-10c2-standard-8katonic.ai/node-pool=compute
3Deployment1-10c2-standard-8katonic.ai/node-pool=deploymentkatonic.ai/node-pool=deployment:NoSchedule
4Vectordb1-4c2-standard-4katonic.ai/node-pool=vectordbkatonic.ai/node-pool=vectordb:NoSchedule
5GPU (Optional)0-5Required VM typekatonic.ai/gpu={GPU-type}katonic.ai/gpu={GPU-type}:NoSchedule

Note: For example we can use GPU type as v100, A30, A100

Note: When backup_enabled = True, then compute_nodes.min_count should be set to 2.

GCP Platform-Node Specificationsโ€‹

Platform nodes in platform GCP cloud deployments must fulfil the following hardware specification requirements according to the deployment type:

ComponentSpecification
Node countmin 2
Instance typec2-standard-4
vCPUs4
Memory16 GB
Boot disk size128 GB

GCP Compute-Node Specificationsโ€‹

Instance types that must be used by compute nodes in GCP cloud installations on the Katonic platform include:

Choose the type that best fits your requirements. GCP GKE (Google Kubernetes Engine) is also supported for application nodes, using the instance types listed below. For specification details for each type, refer to the GCP documentation.

Note: Supported compute node configurations

  • c2-standard-8
  • c2-standard-16
  • c2-standard-32
  • Boot Disk: Min 128GB

GCP Deployment-Node Specificationsโ€‹

Instance types that must be used by deployment nodes in GCP cloud installations on the Katonic platform include:

Choose the type that best fits your requirements. GCP GKE (Google Kubernetes Engine) is also supported for application nodes, using the instance types listed below. The Katonic platform requires at least 1 minimum deployment node for the community version. For specification details for each type, refer to the GCP documentation.

Note: Supported deployment node configurations

  • c2-standard-8
  • c2-standard-16
  • c2-standard-32
  • Boot Disk: Min 128GB

GCP Vectordb-Node Specificationsโ€‹

Vectordb nodes in GCP cloud deployments must fulfil the following hardware specification requirements:

COMPONENTSPECIFICATION
Instance typec2-standard-4

GCP GPU-Node Specificationsโ€‹

As of now, the GPU node pool is supported by Katonic-installer version 5.0.9.

Choose the instance type that best fits your requirements. Google Kubernetes Engine (GKE) is also supported for application nodes in the GKS (Google Kubernetes Service) platform, utilizing the instance types provided by Google Cloud. For specification details for each type, refer to the GCP documentation.

Note: Supported gpu node configurations

  • Boot disk size = Min 512GB
  • Label = katonic.ai/gpu={GPU-type}
  • Taints = katonic.ai/gpu={GPU-type}:NoSchedule

Note: For example we can use GPU type as v100, A30, A100

Additional node pools with distinct katonic.ai/node-pool labels can be added to make other instance types available for Katonic executions.

Katonic Platform Installationโ€‹

General completion time: 45 minute

Installation processโ€‹

The Katonic platform runs on Kubernetes. To simplify the deployment and configuration of Katonic services, Katonic provides an install automation tool called the Katonic-installer that will deploy Katonic into your compatible cluster. The Katonic-installer is an ansible role delivered in a Docker container and can be run locally.

Prerequisitesโ€‹

To install and configure Katonic in your GCP account you must have:

  • quay.io credentials from Katonic

  • GCP with enough quota to create:

    • At least 2 c2-standard-4 machines for platform nodes and at least 1 c2-standard types EC2 machine for compute nodes
  • A Linux operating system (Ubuntu/Debian) based machine with the following Steps:

    a. A Linux operating system (Ubuntu/Debian) based machine needs 4GB RAM and 2vcpus and The boot disk size should be 50GB.

    b. While creating VM select the service account (Katonic) in the Identity and API access section. Skip to step c if you already have the machine with the given specifications.

    Note: After the platform is deployed successfully, the VM can be deleted.

    c. Switch to the root user inside the machine.

    d. gcloud CLI must be installed and logged in to your GCP project and service account using the gcloud init command.

    Commands for installing gcloud CLI:

    apt-get install snapd -y
    snap install google-cloud-cli --classic

    Commands to login using gcloud CLI:

    gcloud init
    gcloud auth application-default login

To install Katonic Platform Ace version follow the steps mentioned below:โ€‹

1. Log in to Quay with the credentials described in the requirements section above.โ€‹

docker login quay.io

2. Retrieve the Katonic installer image from Quay.โ€‹

docker pull quay.io/katonic/katonic-installer:v5.0.9

3. Create a directory.โ€‹

mkdir katonic
cd katonic

4. Adding PEM Encoded Public Key Certificate and Private Key to Directoryโ€‹

Put the PEM encoded public key certificate (having extension .crt) for your domain and private key associated with the given certificate (having extension .key) inside the current directory (katonic).

5. The Katonic Installer can deploy the Katonic Platform Ace version in two ways:โ€‹

  1. Creating GKE and deploying the Katonic Platform Ace version.
  2. Install Katonic Platform Ace version on existing GKE.

1. Creating Private GKE and deploying the Katonic Platform Ace versionโ€‹

A. Single-Node GKE Clusterโ€‹

Initialize the installer application to generate a template configuration file named katonic.yml.

docker run -it --rm --name generating-yaml -v $(pwd):/install quay.io/katonic/katonic-installer:v5.0.9 init gcp katonic_ace single_node deploy_kubernetes private

Edit the configuration file with all necessary details about the target cluster, storage systems, and hosting domain. Read the following configuration reference

PARAMETERDESCRIPTIONVALUE
katonic_platform_versionIt has the value by default regarding the Katonic Platform Version.katonic_ace
deploy_onCluster to be deployed onGCP
create_k8s_clusterMust be set to TrueTrue
single_node_clusterset "True" if opting for single node clusterTrue or False
private_clusterSet "True" when opting for private clusterFalse
control_plane_authorized_networks:List of allowed IP ranges (CIDR) for control plane access.
enable_exposing_genai_applications_to_internetset "True" if opting for exposing genai applications to internetFalse
public_domain_for_genai_applicationsPublic FQDN of domain for genai applications that will be exposed to the internet(eg. public-chatbots.google.com)
vpc_nameEnter the name of VPC created for Private Cluster
subnet_nameEnter the name of subnet created for Private Cluster
internal_loadbalancerSet "True" when opting for internal loadbalancerFalse
gke_k8s_versionGKE versioneg. 1.29(1.27 and above versions supported)
cluster_nameCluster name to beeg. katonic-ace-platform-v5-0
gcp_regionGCP region nameeg. us-east1
gcp_project_idSet your GCP project IDeg. ardent-timm-1000678
service_account_idSet created service account email IDeg. katonic-main@ardent-timm-1000678.iam.gserviceaccount.com
zone_1eg. us-east1-b
zone_2eg. us-east2-c
compute_nodes.instance_typeCompute node VM sizeeg. c2-standard-8
compute_nodes.min_countMinimum number of compute nodes should not be less than 1eg. 1
compute_nodes.max_countMaximum number of compute should be greater than compute nodes min count nodes.eg. 3
compute_nodes.os_disk_sizeCompute Nodes OS Disk Sizeeg. 128 GB
vectordb_nodes.instance_typeVectordb Node VM sizeeg. c2-standard-4
vectordb_nodes.min_countMinimum number of Vectordb nodes should be 1eg. 1
vectordb_nodes.max_countMaximum number of Vectordb should be greater than Deployment nodes min count nodes.eg. 4
vectordb_nodes.os_disk_sizeVectordb Nodes OS Disk Sizeeg. 128 GB
gpu_enabledadd GPU nodepoolTrue or False
gpu_nodes.instance_typeGPU node VM sizeeg n1-standard-1
gpu_nodes.gpu_machine_typeType of machine you needeg nvidia-tesla-p4
gpu_nodes.gpu_typeEnter the GPU type available on the machineeg. v100,k80
gpu_nodes.gpu_counteg 2
gpu_nodes.min_countMinimum number of GPU nodeseg. 1
gpu_nodes.max_countMaximum number of GPU nodeseg. 2
gpu_nodes.os_disk_sizeEnter GPU nodes OS disk sizeeg 512 GB
gpu_nodes.gpu_vRAMEnter GPU node RAM size
gpu_nodes.gpus_per_nodeEnter the number of GPUs per node
enable_gpu_workspaceSet it true if you want to use GPU WorkspaceTrue or False
shared_storage_createSet it True if you want to have shared storageTrue or False
genai_nfs_sizenfs storage size for genai100Gi
workspace_timeout_intervalSet the Timeout Interval in Hourseg. "12"
backup_enabledEnable backupTrue or False
backup_scheduleBackup schedule0 0 1 * *
backup_expirationBackup expiration2160h0m0s
use_custom_domainSet this to True if you want to host the Katonic platform on your custom domain. Skip if use_katonic_domain: TrueTrue or False
custom_domain_nameExpected a valid domaineg. katonic.tesla.com
use_katonic_domainSet this to True if you want to host the Katonic platform on the Katonic Deploy Platform domain. Skip if use_custom_domain: TrueTrue or False
katonic_domain_prefixOne word expected with no special characters and all small alphabetseg. tesla
enable_pre_checksSet this to True if you want to perform the Pre-checksTrue / False
AD_Group_ManagementSet "True" to enable functionality that provides you ability to sign in using Azure ADFalse
AD_CLIENT_IDClient ID of App registered for SSO in client's Azure or Identity Provider
AD_CLIENT_SECRETClient Secret of App registered for SSO in client's Azure or any other Identity Provider
AD_AUTH_URLAuthorization URL endpoint of app registered for SSO.
AD_TOKEN_URLToken URL endpoint of app registered for SSO.
quay_usernameUsername for quay
quay_passwordPassword for quay
adminUsernameEmail for admin usereg. john@katonic.ai
adminPasswordPassword for admin userAt least 1 special character, at least 1 uppercase letter, at least 1 lowercase letter, minimum 8 characters
adminFirstNameAdmin first nameeg. john
adminLastNameAdmin last nameeg. musk

B. Multi-Node GKE Clusterโ€‹

Initialize the installer application to generate a template configuration file named katonic.yml.

docker run -it --rm --name generating-yaml -v $(pwd):/install quay.io/katonic/katonic-installer:v5.0.9 init gcp katonic_ace multi_node deploy_kubernetes private

Edit the configuration file with all necessary details about the target cluster, storage systems, and hosting domain. Read the following configuration reference

PARAMETERDESCRIPTIONVALUE
katonic_platform_versionIt has the value by default regarding the Katonic Platform Version.katonic_ace
deploy_onCluster to be deployed onGCP
create_k8s_clusterMust be set to TrueTrue
private_clusterSet "True" when opting for private clusterFalse
control_plane_authorized_networks:List of allowed IP ranges (CIDR) for control plane access.
enable_exposing_genai_applications_to_internetset "True" if opting for exposing genai applications to internetFalse
public_domain_for_genai_applicationsPublic FQDN of domain for genai applications that will be exposed to the internet(eg. public-chatbots.google.com)
vpc_nameEnter the name of VPC created for Private Cluster
subnet_nameEnter the name of subnet created for Private Cluster
internal_loadbalancerSet "True" when opting for internal loadbalancerFalse
gke_k8s_versionGKE versioneg. 1.29(1.27 and above versions supported)
cluster_nameCluster name to beeg. katonic-ace-platform-v5-0
gcp_regionGCP region nameeg. us-east1
gcp_project_idSet your GCP project IDeg. ardent-timm-1000678
service_account_idSet created service account email IDeg. katonic-main@ardent-timm-1000678.iam.gserviceaccount.com
zone_1eg. us-east1-b
zone_2eg. us-east2-c
platform_nodes.instance_typePlatform node VM sizeeg. c2-standard-4
platform_nodes.min_countMinimum number of platform nodes should be 2eg. 2
platform_nodes.max_countMaximum number of platform should be greater than platform nodes min counteg. 3
platform_nodes.os_disk_sizePlatform Nodes OS Disk Sizeeg. 128 GB
compute_nodes.instance_typeCompute node VM sizeeg. c2-standard-8
compute_nodes.min_countMinimum number of compute nodes should not be less than 1eg. 1
compute_nodes.max_countMaximum number of compute should be greater than compute nodes min count nodes.eg. 3
compute_nodes.os_disk_sizeCompute Nodes OS Disk Sizeeg. 128 GB
vectordb_nodes.instance_typeVectordb Node VM sizeeg. c2-standard-4
vectordb_nodes.min_countMinimum number of Vectordb nodes should be 1eg. 1
vectordb_nodes.max_countMaximum number of Vectordb should be greater than Deployment nodes min count nodes.eg. 4
vectordb_nodes.os_disk_sizeVectordb Nodes OS Disk Sizeeg. 128 GB
deployment_nodes.instance_typeDeployment Node VM sizeeg. c2-standard-8
deployment_nodes.min_countMinimum number of Deployment nodes should be 1eg. 1
deployment_nodes.max_countMaximum number of Deployment should be greater than Deployment nodes min count nodes.eg. 4
deployment_nodes.os_disk_sizeDeployment Nodes OS Disk Sizeeg. 128 GB
gpu_enabledadd GPU nodepoolTrue or False
gpu_nodes.instance_typeGPU node VM sizeeg n1-standard-1
gpu_nodes.gpu_machine_typeType of machine you needeg nvidia-tesla-p4
gpu_nodes.gpu_typeEnter the GPU type available on the machineeg. v100,k80
gpu_nodes.gpu_counteg 2
gpu_nodes.min_countMinimum number of GPU nodeseg. 1
gpu_nodes.max_countMaximum number of GPU nodeseg. 2
gpu_nodes.os_disk_sizeEnter GPU nodes OS disk sizeeg 512 GB
gpu_nodes.gpu_vRAMEnter GPU node RAM size
gpu_nodes.gpus_per_nodeEnter the number of GPUs per node
enable_gpu_workspaceSet it true if you want to use GPU WorkspaceTrue or False
shared_storage_createSet it True if you want to have shared storageTrue or False
genai_nfs_sizenfs storage size for genai100Gi
workspace_timeout_intervalSet the Timeout Interval in Hourseg. "12"
backup_enabledEnable backupTrue or False
backup_scheduleBackup schedule0 0 1 * *
backup_expirationBackup expiration2160h0m0s
use_custom_domainSet this to True if you want to host the Katonic platform on your custom domain. Skip if use_katonic_domain: TrueTrue or False
custom_domain_nameExpected a valid domaineg. katonic.tesla.com
use_katonic_domainSet this to True if you want to host the Katonic platform on the Katonic Deploy Platform domain. Skip if use_custom_domain: TrueTrue or False
katonic_domain_prefixOne word expected with no special characters and all small alphabetseg. tesla
enable_pre_checksSet this to True if you want to perform the Pre-checksTrue / False
AD_Group_ManagementSet "True" to enable functionality that provides you ability to sign in using Azure ADFalse
AD_CLIENT_IDClient ID of App registered for SSO in client's Azure or Identity Provider
AD_CLIENT_SECRETClient Secret of App registered for SSO in client's Azure or any other Identity Provider
AD_AUTH_URLAuthorization URL endpoint of app registered for SSO.
AD_TOKEN_URLToken URL endpoint of app registered for SSO.
quay_usernameUsername for quay
quay_passwordPassword for quay
adminUsernameEmail for admin usereg. john@katonic.ai
adminPasswordPassword for admin userAt least 1 special character, at least 1 uppercase letter, at least 1 lowercase letter, minimum 8 characters
adminFirstNameAdmin first nameeg. john
adminLastNameAdmin last nameeg. musk

Installing the Katonic Platform Ace versionโ€‹

After configuring the katonic.yml file, run the following command to install the Katonic Platform Ace version:

docker run -it --rm --name install-katonic -v /root/.config:/root/.config -v $(pwd):/inventory quay.io/katonic/katonic-installer:v5.0.9

2. Deploying Katonic Platform Ace version on existing Private GKEโ€‹

The steps are similar to Installing the Katonic Platform with GCP Google Kubernetes Engine. Just edit the configuration file with all the details about the target cluster, storage systems, and hosting domain. Read the following configuration reference, these are the only parameters required when installing the Katonic MLOps platform on existing GKE.

Prerequisites

You will need to create a kfs named storage class. Please refer to the main documentation of GCP โ†’ Dynamic Block Storage for instructions on how to create the storage class.

A. Single-Node GKE Clusterโ€‹

Initialize the installer application to generate a template configuration file named katonic.yml.

docker run -it --rm --name generating-yaml -v $(pwd):/install quay.io/katonic/katonic-installer:v5.0.9 init gcp katonic_ace single_node kubernetes_already_exists private

B. Multi-Node GKE Clusterโ€‹

Initialize the installer application to generate a template configuration file named katonic.yml.

docker run -it --rm --name generating-yaml -v $(pwd):/install quay.io/katonic/katonic-installer:v5.0.9 init gcp katonic_ace multi_node kubernetes_already_exists private

For both single node and multi-node clusters, the configuration template includes the following parameters:

PARAMETERDESCRIPTIONVALUE
katonic_platform_versionIt has the value by default regarding the Katonic Platform Version.katonic_ace
deploy_onCluster to be deployed onGCP
single_node_clusterset "True" if opting for single node clusterTrue or False
private_clusterSet "True" when opting for private clusterFalse
control_plane_authorized_networksList of allowed IP ranges (CIDR) for control plane access.
enable_exposing_genai_applications_to_internetset "True" if opting for exposing genai applications to internetFalse
public_domain_for_genai_applicationsPublic FQDN of domain for genai applications that will be exposed to the internet(eg. public-chatbots.google.com)
internal_loadbalancerSet "True" when opting for internal loadbalancerFalse
cluster_nameEnter cluster name that you deployeg katonic-ace-platform-v5-0
gcp_regionGCP region nameeg. us-east1
gcp_project_idSet your GCP project IDeg. ardent-timm-1000678
genai_nfs_sizenfs storage size for genai100Gi
workspace_timeout_intervalSet the Timeout Interval in Hourseg. "12"
use_custom_domainSet this to True if you want to host katonic platform on your custom domain. Skip if use_katonic_domain: TrueTrue or False
custom_domain_nameExpected a valid domain.eg. katonic.tesla.com
use_katonic_domainSet this to True if you want to host katonic platform on Katonic Deploy Platform domain. Skip if use_custom_domain: TrueTrue or False
katonic_domain_prefixOne word expected with no special characters and all small alphabetseg. tesla
enable_pre_checksSet this to True if you want to perform the Pre-checksTrue / False
AD_Group_ManagementSet "True" to enable functionality that provides you ability to sign in using Azure ADFalse
AD_CLIENT_IDClient ID of App registered for SSO in client's Azure or Identity Provider
AD_CLIENT_SECRETClient Secret of App registered for SSO in client's Azure or any other Identity Provider
AD_AUTH_URLAuthorization URL endpoint of app registered for SSO.
AD_TOKEN_URLToken URL endpoint of app registered for SSO.
quay_usernameUsername of quay
quay_passwordPassword of quay
adminUsernameemail for admin usereg. john@katonic.ai
adminPasswordpassword for admin userat least 1 special character at least 1 upper case letter at least 1 lower case letter minimum 8 characters
adminFirstNameAdmin first nameeg. john
adminLastNameAdmin last nameeg. musk

Note: In the katonic.yml template, the single_node_cluster parameter will be set to True for single node clusters and will be omitted for multi-node clusters

Installing Katonic Platform Ace versionโ€‹

After configuring the katonic.yml file, run the following command to install the Katonic Platform Ace version:

docker run -it --rm --name install-katonic -v /root/.config:/root/.config -v $(pwd):/inventory quay.io/katonic/katonic-installer:v5.0.9

Installation Verificationโ€‹

The installation process can take up to one hour to complete fully. The installer will output verbose logs, and commands to take kubectl access of deployed cluster and surface any errors it encounters. After installation, you can use the following commands to check whether all applications are running or not.

kubectl get pods --all-namespace

This will show the status of all pods being created by the installation process. If you see any pods enter a crash loop or hang in a non-ready state, you can get logs from that pod by running:

kubectl logs $POD_NAME --namespace $NAMESPACE_NAME

If the installation completes successfully, you should see a message that says:

TASK [platform-deployment : Credentials to access Katonic Ace Platform] *******************************ok: [localhost] => {
"msg": [
"Platform Domain: $domain_name",
"Username: $adminUsername",
"Password: $adminPassword"
]
}

However, the application will only be accessible via HTTPS at that FQDN if you have configured DNS for the name to point to an ingress load balancer with the appropriate SSL certificate that forwards traffic to your platform nodes.

Test and troubleshootโ€‹

To verify the successful installation of Katonic, perform the following tests:

  • If you encounter a 500 or 502 error, take access of your cluster and execute the following command:

    kubectl rollout restart deploy nodelog-deploy -n application
  • Login to the Katonic application and ensure that all the navigation panel options are operational. If this test fails, please verify that Keycloak was set up properly.

  • Create a new project and launch a Jupyter/JupyterLab workspace. If this test fails, please check that the default environment images have been loaded in the cluster.

  • Publish an app with Flask or Shiny. If this test fails, please verify that the environment images have Flask and Shiny installed.

Deleting the Katonic platform from GCPโ€‹

When you start the installation, in your current directory, you will get the platform deletion script. you just need to run the script.

./gcp-cluster-delete.sh