Katonic Companion
Node pool requirementsโ
The AKS cluster can be configured as either a single-node cluster or a multi-node cluster, depending on the user's needs:
A. Single-Node AKS Cluster:โ
The AKS cluster must have one node pool that produce worker nodes with the following specifications and distinct node labels.
SR NO. | POOL | MIN-MAX | INSTANCE | LABELS | TAINTS |
---|---|---|---|---|---|
1 | Compute | 1-10 | Standard_D8s_v3 or Standard_D8ads_v5 | katonic.ai/node-pool=compute |
B. Multi-Node AKS Cluster:โ
The AKS cluster must have at least four node pools that produce worker nodes with the following specifications and distinct node labels, and it might include an optional GPU pool:
SR NO. | POOL | MIN-MAX | VM | LABELS | TAINTS |
---|---|---|---|---|---|
1 | Platform | 2-4 | Standard_D2s_v3 or Standard_D8ads_v5 | katonic.ai/node-pool=platform | katonic.ai/node-pool=platform:NoSchedule |
2 | Compute | 1-10 | Standard_D8s_v3 or Standard_D8ads_v5 | katonic.ai/node-pool=compute | |
3 | Deployment | 1-10 | Standard_D8s_v3 or Standard_D8ads_v5 | katonc.ai/node-pool=deployment | katonic.ai/node-pool=deployment:NoSchedule |
4 | Vectordb | 1-4 | Standard_D2s_v3 | katonic.ai/node-pool=vectordb | katonic.ai/node-pool=vectordb:NoSchedule |
5 | GPU (Optional) | 0-5 | Standard_NC6s_v3 | katonic.ai/gpu={GPU-type} | katonic.ai/gpu={GPU-type}:NoSchedule |
Note: Instance Type and Region Considerations
The cost of virtual machines varies according to the chosen instance type and region.
You are encouraged to select from the mentioned instance types based on your specific requirements and budgetary considerations.Please note that the following regions are not supported for deployment:
- Brazil South
- Germany Central
- Germany Northeast
- Austria East
- Denmark East
- Italy South
- Italy Central
- East India
- New Zealand North
- New Zealand Southeast
- Central Indonesia
- East Indonesia
Note: For example we can use GPU type as v100, A30, A100
Note: When backup_enabled = True, then compute_nodes.min_count should be set to 2.
If you want Katonic to run with some components deployed as highly available ReplicaSets you must use 2 availability zones. All compute node pools you use must have corresponding ASGs in any AZ used by other node pools. Setting up an isolated node pool in one zone can cause volume affinity issues.
To run the node pools across multiple availability zones, you will need duplicate ASGs in each zone with the same configuration, including the same labels, to ensure pods are delivered to the zone where the required ephemeral volumes are available.
Additional ASGs with distinct katonic.ai/node-pool labels can be added to make other instance types available for Katonic executions.
The Katonic installer can set up all configurations of ASG and zones for the Katonic platform.
Azure Platform-Node Specificationsโ
Platform nodes in platform Azure cloud deployments must fulfil the following hardware specification requirements according to the deployment type:
SR NO. | COMPONENT | SPECIFICATION |
---|---|---|
1 | Node count | Min 2 |
2 | Instance type | Standard_D2s_v3 or Standard_D8ads_v5 |
3 | vCPUs | 4 |
4 | Memory | 14 GB |
5 | Boot disk size | 128 GB |
Azure Compute-Node Specificationsโ
The following instance types are required for compute nodes in Azure cloud deployments for the Katonic platform:
Choose the type that best fits your requirements. Azure Kubernetes Service (AKS) is also supported for application nodes, using the instance types listed below. The Katonic platform requires at least 1 minimum Compute node for the Katonic Data Science version. For specification details for each type, refer to the Azure documentation.
Note: Supported compute node configurations
- Standard_D8ads_v5 (default configuration)
- Standard_D8s_v3
- Standard_D16s_v3
- Standard_D32s_v3
- Standard_D48s_v3
- Standard_D64s_v3
- Boot Disk: 128GB
Azure Deployment-Node Specificationsโ
The following instance types are required for deployment nodes in Azure cloud deployments for the Katonic platform:
Choose the type that best fits your requirements. Azure Kubernetes Service (AKS) is also supported for application nodes, using the instance types listed below. The Katonic platform requires at least 1 minimum deployment node for the teams version. For specification details for each type, refer to the Azure documentation.
Note: Supported deployment node configurations
- Standard_D8ads_v5 (default configuration)
- Standard_D8s_v3
- Standard_D16s_v3
- Standard_D32s_v3
- Standard_D48s_v3
- Standard_D64s_v3
- Boot Disk: 128GB
AWS Vectordb-Node Specificationsโ
Vectordb nodes in Azure cloud deployments must fulfil the following hardware specification requirements:
COMPONENT | SPECIFICATION |
---|---|
Instance type | Standard_DS3_v2 |
Azure GPU-Node Specificationsโ
GPU nodes in platform Azure cloud deployments must use one of the following instance types:
Choose the type that best fits your requirements. Azure Kubernetes Service (AKS) is also supported for application nodes, using the instance types listed below. For specification details for each type, refer to the Azure documentation.
Note: Supported GPU node configurations
- NCv3-series (GPU optimized)
- Boot Disk: 512 GB
Additional node pools can be added with distinct katonic.ai/node-pool labels to make other instance types available for Katonic executions.
Prerequisitesโ
To install and configure Katonic in your Azure account you must have:
Quay credentials from Katonic.
PEM encoded public key certificate for your domain and private key associated with the given certificate.
An Azure subscription with enough quota to create:
- At least 4 Standard_D8s_v3 or Standard_D8ads_v5 VMs.
- NC6s_v3 or similar SKU VMs, if you want to use GPU.
A Linux operating system (Ubuntu/Debian) based machine with the following Steps:
a. Create a Resource Group in Azure
az group create --name <RESOURCE_GROUP> \ --location <ZONE>
Note: You can get a list of all available locations by running the following command:
az account list-locations
You need to pass the name of the resource group later to the Katonic-installer.
b. A Linux operating system (Ubuntu/Debian) based machine having 4GB RAM and 2vcpus. Skip this step if you already have the machine with the given specifications.
Note: After the platform is deployed successfully, the VM can be deleted.
c. Switch to the root user inside the machine.
d. Azure CLI's Latest version 2.35.0+ specifically must be installed and logged in to your Azure account using the az login command, with a user that has a contributor role on the subscription.
Note: To achieve this on Debian-based machines, follow the install Azure CLI v2.35+
e. If your Azure has tenants, use the following command to get your subscription ID.
az account list --output table
Save this as later on you need to pass it to the Katonic-installer.
To install Katonic Platform Companion version follow the steps mentioned below:โ
1. Access the JumpHost and perform az login.โ
2. Log in to Quay with the credentials described in the requirements section above.โ
docker login quay.io
3. Retrieve the Katonic installer image from Quay.โ
docker pull quay.io/katonic/katonic-installer:v5.0.9
4. Create a directory.โ
mkdir katonic
cd katonic
5. Adding PEM Encoded Public Key Certificate and Private Key to Directoryโ
Put the PEM encoded public key certificate (having extension.crt) for your domain and private key associated with the given certificate (having extension .key) inside the current directory (katonic).
6. The Katonic Installer can deploy the Katonic Platform Companion version in two ways:โ
- Creating AKS and deploying the Katonic Platform Companion version.
- Install Katonic Platform Companion version on existing AKS Azure Kubernetes Service.
1. Creating AKS and deploying the Katonic Platform Companion version.โ
A. Single-Node AKS Clusterโ
Initialize the installer application to generate a template configuration file named katonic.yml.
docker run -it --rm --name generating-yaml -v $(pwd):/install quay.io/katonic/katonic-installer:v5.0.9 init azure katonic_companion single_node deploy_kubernetes private
Edit the configuration file with all necessary details about the target cluster, storage systems, and hosting domain. Read the following configuration reference:
PARAMETER | DESCRIPTION | VALUE |
---|---|---|
katonic_platform_version | It has the value by default regarding the Katonic Platform Version. | katonic_companion |
deploy_on | Cloud platform on which Katonic is to be deployed. | Azure |
create_k8s_cluster | Is set to false if the Kubernetes cluster is already present. If it is true, the installer will create Kubernetes cluster on provided cloud platform | True |
kubernetes_version | AKS Version | eg. 1.29(1.27 and above versions supported) |
single_node_cluster | set "True" if opting for single node cluster | True or False |
private_cluster | Set "True" when opting for private cluster | False |
enable_exposing_genai_applications_to_internet | set "True" if opting for exposing genai applications to internet | False |
public_domain_for_genai_applications | Public FQDN of domain for genai applications that will be exposed to the internet | (eg. public-chatbots.google.com) |
internal_loadbalancer | Set "True" when opting for internal loadbalancer | False |
cluster_name | Name of the cluster | eg. katonic-companion-platform-v5-0 |
resource_group_name | Azure resource group name | eg. my-resource-group |
resource_group_location | Azure resource group location | eg. centralindia |
azure_subscription_id | Azure Subscription ID | |
vnet_name | name of vnet for private cluster | |
aks_subnet_name | name of aks subnet for private cluster | |
compute_nodes.instance_type | Compute node VM size | eg. Standard_D8ads_v5 |
compute_nodes.min_count | Minimum number of compute nodes shoul be 1 | eg. 1 |
compute_nodes.max_count | Maximum number of compute nodes | eg. 4 |
compute_nodes.os_disk_size | Compute Nodes OS Disk Size | eg. 128 GB |
vectordb_nodes.instance_type | Vectordb Node VM size | eg. Standard_D2s_v3 |
vectordb_nodes.min_count | Minimum number of Vectordb nodes should be 1 | eg. 1 |
vectordb_nodes.max_count | Maximum number of Vectordb should be greater than Deployment nodes min count nodes. | eg. 4 |
vectordb_nodes.os_disk_size | Vectordb Nodes OS Disk Size | eg. 128 GB |
gpu_enabled | Add GPU nodepool | True or False |
gpu_nodes.instance_type | GPU node VM size | eg. Standard_NC6s_v3 |
gpu_nodes.gpu_type | Enter the type of gpu available on machine | eg v100, k80 |
gpu_nodes.min_count | Minimum number of GPU nodes | eg. 1 |
gpu_nodes.max_count | Maximum number of GPU nodes | eg. 4 |
gpu_nodes.os_disk_size | GPU Nodes OS Disk Size | eg. 512 GB |
gpu_nodes.gpu_vRAM | Enter Gpu node RAM size | |
gpu_nodes.gpus_per_node | Enter GPU per node count | |
enable_gpu_workspace | Set it true if you want to use GPU Workspace | True or False |
storage_class_type.Premium_LRS | If you prefer to select "Premium_LRS" as your storage class type instead of "StandardSSD_LRS," please write "True." | True or False |
shared_storage_create | Set it True if you want to have shared storage | True or False |
genai_nfs_size | nfs storage size for genai | 100Gi |
workspace_timeout_interval | Set the Timeout Interval in Hours | eg. "12" |
backup_enabled | Backup enabling | True or False |
backup_schedule | Scheduling of backup | 0 0 1 * * |
backup_expiration | Expiration of backup | 2160h0m0s |
use_custom_domain | Set this to True if you want to host katonic platform on your custom domain. Skip if use_katonic_domain: True | True or False |
custom_domain_name | Expected a valid domain. | eg. katonic.tesla.com |
use_katonic_domain | Set this to True if you want to host katonic platform on Katonic Deploy Platform domain. Skip if use_custom_domain: True | True or False |
katonic_domain_prefix | One word expected with no special characters and all small alphabets | eg. tesla |
enable_pre_checks | Set this to True if you want to perform the Pre-checks | True / False |
AD_Group_Management | Set "True" to enable functionality that provides you ability to sign in using Azure AD | False |
AD_CLIENT_ID | Client ID of App registered for SSO in client's Azure or Identity Provider | |
AD_CLIENT_SECRET | Client Secret of App registered for SSO in client's Azure or any other Identity Provider | |
AD_AUTH_URL | Authorization URL endpoint of app registered for SSO. | |
AD_TOKEN_URL | Token URL endpoint of app registered for SSO. | |
quay_username | Username for quay | |
quay_password | Password for quay | |
adminUsername | Email for admin user | eg. john@katonic.ai |
adminPassword | Password for admin user | at least 1 special character at least 1 upper case letter at least 1 lower case letter minimum 8 characters |
adminFirstName | Admin first name | eg. john |
adminLastName | Admin last name | eg. musk |
B. Multi-Node AKS Clusterโ
Initialize the installer application to generate a template configuration file named katonic.yml.
docker run -it --rm --name generating-yaml -v $(pwd):/install quay.io/katonic/katonic-installer:v5.0.9 init azure katonic_companion multi_node deploy_kubernetes private
Edit the configuration file with all necessary details about the target cluster, storage systems, and hosting domain. Read the following configuration reference:
PARAMETER | DESCRIPTION | VALUE |
---|---|---|
katonic_platform_version | It has the value by default regarding the Katonic Platform Version. | katonic_companion |
deploy_on | Cloud platform on which Katonic is to be deployed. | Azure |
create_k8s_cluster | Is set to false if the Kubernetes cluster is already present. If it is true, the installer will create Kubernetes cluster on provided cloud platform | True |
kubernetes_version | AKS Version | eg. 1.29(1.27 and above versions supported) |
private_cluster | Set "True" when opting for private cluster | False |
enable_exposing_genai_applications_to_internet | set "True" if opting for exposing genai applications to internet | False |
public_domain_for_genai_applications | Public FQDN of domain for genai applications that will be exposed to the internet | (eg. public-chatbots.google.com) |
internal_loadbalancer | Set "True" when opting for internal loadbalancer | False |
cluster_name | Name of the cluster | eg. katonic-companion-platform-v5-0 |
resource_group_name | Azure resource group name | eg. my-resource-group |
resource_group_location | Azure resource group location | eg. centralindia |
azure_subscription_id | Azure Subscription ID | |
vnet_name | name of vnet for private cluster | |
aks_subnet_name | name of aks subnet for private cluster | |
platform_nodes.instance_type | Platform node VM size | eg. Standard_D8ads_v5 |
platform_nodes.min_count | Minimum number of platform nodes should be 2 Note: You require a minimum of 3 platform nodes to install Superset or Airbyte | eg. 2 |
platform_nodes.max_count | Maximum number of platform nodes | eg. 4 |
platform_nodes.os_disk_size | Platform Nodes OS Disk Size | eg. 128 GB |
compute_nodes.instance_type | Compute node VM size | eg. Standard_D8ads_v5 |
compute_nodes.min_count | Minimum number of compute nodes shoul be 1 | eg. 1 |
compute_nodes.max_count | Maximum number of compute nodes | eg. 4 |
compute_nodes.os_disk_size | Compute Nodes OS Disk Size | eg. 128 GB |
vectordb_nodes.instance_type | Vectordb Node VM size | eg. Standard_D2s_v3 |
vectordb_nodes.min_count | Minimum number of Vectordb nodes should be 1 | eg. 1 |
vectordb_nodes.max_count | Maximum number of Vectordb should be greater than Deployment nodes min count nodes. | eg. 4 |
vectordb_nodes.os_disk_size | Vectordb Nodes OS Disk Size | eg. 128 GB |
deployment_nodes.instance_type | Deployment Node VM size | eg. Standard_D8ads_v5 |
deployment_nodes.min_count | Minimum number of Deployment nodes should be 1 | eg. 1 |
deployment_nodes.max_count | Maximum number of Deployment should be greater than Deployment nodes min count nodes. | eg. 4 |
deployment_nodes.os_disk_size | Deployment Nodes OS Disk Size | eg. 128 GB |
gpu_enabled | Add GPU nodepool | True or False |
gpu_nodes.instance_type | GPU node VM size | eg. Standard_NC6s_v3 |
gpu_nodes.gpu_type | Enter the type of gpu available on machine | eg v100, k80 |
gpu_nodes.min_count | Minimum number of GPU nodes | eg. 1 |
gpu_nodes.max_count | Maximum number of GPU nodes | eg. 4 |
gpu_nodes.os_disk_size | GPU Nodes OS Disk Size | eg. 512 GB |
gpu_nodes.gpu_vRAM | Enter Gpu node RAM size | |
gpu_nodes.gpus_per_node | Enter GPU per node count | |
enable_gpu_workspace | Set it true if you want to use GPU Workspace | True or False |
storage_class_type.Premium_LRS | If you prefer to select "Premium_LRS" as your storage class type instead of "StandardSSD_LRS," please write "True." | True or False |
shared_storage_create | Set it True if you want to have shared storage | True or False |
genai_nfs_size | nfs storage size for genai | 100Gi |
workspace_timeout_interval | Set the Timeout Interval in Hours | eg. "12" |
backup_enabled | Backup enabling | True or False |
backup_schedule | Scheduling of backup | 0 0 1 * * |
backup_expiration | Expiration of backup | 2160h0m0s |
use_custom_domain | Set this to True if you want to host katonic platform on your custom domain. Skip if use_katonic_domain: True | True or False |
custom_domain_name | Expected a valid domain. | eg. katonic.tesla.com |
use_katonic_domain | Set this to True if you want to host katonic platform on Katonic Deploy Platform domain. Skip if use_custom_domain: True | True or False |
katonic_domain_prefix | One word expected with no special characters and all small alphabets | eg. tesla |
enable_pre_checks | Set this to True if you want to perform the Pre-checks | True / False |
AD_Group_Management | Set "True" to enable functionality that provides you ability to sign in using Azure AD | False |
AD_CLIENT_ID | Client ID of App registered for SSO in client's Azure or Identity Provider | |
AD_CLIENT_SECRET | Client Secret of App registered for SSO in client's Azure or any other Identity Provider | |
AD_AUTH_URL | Authorization URL endpoint of app registered for SSO. | |
AD_TOKEN_URL | Token URL endpoint of app registered for SSO. | |
quay_username | Username for quay | |
quay_password | Password for quay | |
adminUsername | Email for admin user | eg. john@katonic.ai |
adminPassword | Password for admin user | at least 1 special character at least 1 upper case letter at least 1 lower case letter minimum 8 characters |
adminFirstName | Admin first name | eg. john |
adminLastName | Admin last name | eg. musk |
Installing the Katonic Platform Companion versionโ
After configuring the katonic.yml file, run the following command to install the Katonic Platform Companion version:
docker run -it --rm --name install-katonic -v /root/.azure:/root/.azure -v $(pwd):/inventory quay.io/katonic/katonic-installer:v5.0.9
This will start a container and deploy the entire platform.
2. Deploying Katonic Platform Companion version on existing Private AKSโ
The steps are similar to Installing the Katonic Platform with Azure Kubernetes Service. Just edit the configuration file with all the necessary details about the target cluster, storage systems, and hosting domain. Read the following configuration reference, these are the only parameters required when installing the Katonic Companion platform on existing private AKS.
Prerequisites
You will need to create a kfs named storage class. Please refer to the main documentation of Azure โ Dynamic Block Storage for instructions on how to create the storage class.
A. Single-Node AKS Clusterโ
Initialize the installer application to generate a template configuration file named katonic.yml.
docker run -it --rm --name generating-yaml -v $(pwd):/install quay.io/katonic/katonic-installer:v5.0.9 init azure katonic_companion single_node kubernetes_already_exists private
B. Multi-Node AKS Clusterโ
Initialize the installer application to generate a template configuration file named katonic.yml.
docker run -it --rm --name generating-yaml -v $(pwd):/install quay.io/katonic/katonic-installer:v5.0.9 init azure katonic_companion multi_node kubernetes_already_exists private
For both single node and multi-node clusters, the configuration template includes the following parameters:
PARAMETER | DESCRIPTION | VALUE |
---|---|---|
katonic_platform_version | It has the value by default regarding the Katonic Platform Version. | katonic_companion |
deploy_on | Katonic Companion platform can be deployed on | Azure |
single_node_cluster | set "True" if opting for single node cluster | True or False |
private_cluster | Set "True" when opting for private cluster | False |
enable_exposing_genai_applications_to_internet | set "True" if opting for exposing genai applications to internet | False |
public_domain_for_genai_applications | Public FQDN of domain for genai applications that will be exposed to the internet | (eg. public-chatbots.google.com) |
internal_loadbalancer | Set "True" when opting for private ip for loadbalancer | False |
cluster_name | Enter cluster name which you deploy | eg. katonic-companion-platform-v5-0 |
resource_group_name | Enter your cluster resource group name | eg. my-resource-group |
resource_group_location | Enter your cluster resource group name location | eg. centralindia |
azure_subscription_id | Azure Subscription ID | |
genai_nfs_size | nfs storage size for genai | 100Gi |
workspace_timeout_interval | Set the Timeout Interval in Hours | eg. "12" |
custom_domain_name | Expected a valid domain. | eg. katonic.tesla.com |
use_katonic_domain | Set this to True if you want to host Katonic platform on Katonic Companion Platform domain. Skip if use_custom_domain: True | True or False |
katonic_domain_prefix | One word expected with no special characters and all small alphabets | eg. tesla |
enable_pre_checks | Set this to True if you want to perform the Pre-checks | True / False |
AD_Group_Management | Set "True" to enable functionality that provides you ability to sign in using Azure AD | False |
AD_CLIENT_ID | Client ID of App registered for SSO in client's Azure or Identity Provider | |
AD_CLIENT_SECRET | Client Secret of App registered for SSO in client's Azure or any other Identity Provider | |
AD_AUTH_URL | Authorization URL endpoint of app registered for SSO. | |
AD_TOKEN_URL | Token URL endpoint of app registered for SSO. | |
quay_username | Username for quay | |
quay_password | Password for quay | |
adminUsername | Email for admin user | eg. john@katonic.ai |
adminPassword | Password for admin user | at least 1 special character at least 1 upper case letter at least 1 lower case letter minimum 8 characters |
adminFirstName | Admin first name | eg. john |
adminLastName | Admin last name | eg. musk |
Note: In the katonic.yml template, the single_node_cluster parameter will be set to True for single node clusters and will be omitted for multi-node clusters
Installing the Katonic Platform Companion versionโ
After configuring the katonic.yml file, run the following command to install the Katonic Platform Companion version:
docker run -it --rm --name install-katonic -v /root/.azure:/root/.azure -v $(pwd):/inventory quay.io/katonic/katonic-installer:v5.0.9
Installation Verificationโ
The installation process can take up to 45 minutes to fully complete. The installer will output verbose logs, and commands to take kubectl access of deployed cluster and surface any errors it encounters. After installation, you can use the following commands to check whether all applications are in a running state or not.
cd /root/katonic
az aks get-credentials --resource-group $(cat /root/katonic/katonic.yml | grep resource_group_name | awk '{print $2}') --name $(cat /root/katonic/katonic.yml | grep cluster_name | awk '{print $2}')
kubectl get pods --all-namespace
This will show the status of all pods being created by the installation process. If you see any pods enter a crash loop or hang in a non-ready state, you can get logs from that pod by running:
kubectl logs $POD_NAME --namespace $NAMESPACE_NAME
If the installation completes successfully, you should see a message that says:
TASK [platform-deployment : Credentials to access Katonic Companion Platform] *******************************ok: [localhost] => {
"msg": [
"Platform Domain: $domain_name",
"Username: $adminUsername",
"Password: $adminPassword"
]
}
However, the application will only be accessible via HTTPS at that FQDN if you have configured DNS for the name to point to an ingress load balancer with the appropriate SSL certificate that forwards traffic to your platform nodes.
Test and troubleshootโ
To verify the successful installation of Katonic, perform the following tests:
If you encounter a 500 or 502 error, take access of your cluster and execute the following command:
kubectl rollout restart deploy nodelog-deploy -n application
Login to the Katonic application and ensure that all the navigation panel options are operational. If this test fails, please verify that Keycloak was set up properly.
Create a new project and launch a Jupyter/JupyterLab workspace. If this test fails, please check that the default environment images have been loaded in the cluster.
Publish an app with Flask or Shiny. If this test fails, please verify that the environment images have Flask and Shiny installed.
Deleting the Katonic Platform from Azureโ
To delete Katonic Platform from your Azure, you must delete its resource group.