Building SEP YAML

Setting the foundation for your SEP installation

In this guide, you will retrieve the main YAML file needed to configure SEP and alter values matching your kubernetes environment.

If you have not completed the steps in the helm charts guide, do so before starting this guide.

How does the config YAML file work

A YAML file is a versatile format utilized for configuring software applications and facilitating data sharing across different programming environments. Its design emphasizes simplicity and readability, making it an ideal choice for developers to define structured data easily.

Each helm chart provides a values.yaml file but it is strongly recommended that it is not altered. Instead, we will create separate files that will override the configuration files instead.

The file set described below describes a series of focused configuration files. If you have more than one cluster, such as a test cluster and a production cluster, name the files accordingly before you begin. Examples are provided in the sections that follow.

File NameContent
registry-access.yamlDocker registry access credentials file, typically to access the Docker registry on the Starburst Harbor instance.
sep-prod-catalogs.yamlCatalog configuration for all catalogs configured for SEP on the prod cluster. It is typically useful to separate catalog configurations out into a separate file to allow reuse across clusters
sep-prod-setup.yamlMain configuration file for the prod cluster. Include any configuration for all other top level nodes that configure the coordinator, workers, and all other aspects of the cluster.

Version Control

It is recommended to implement a version control system such as git to keep track of all changes made within configuration files.

1

Initialize Starburst Setup

Begin by creating a dedicated directory on your machine, named starburst-sep. This directory will serve as the central hub for executing Helm commands and managing all necessary files for installing, updating, and operating Starburst.

bash
mkdir starburst-sep

Navigate to the starburst-sep directory. Here, you'll execute a Helm command to fetch the default helm chart for Starburst.

bash
cd starburst-sep

The retrieved file should be named starburst-default.yaml. It's important to keep this file in its original form, as it provides a comprehensive view of all default settings. You'll be creating additional, more concise files to modify specific values in this YAML file later.

When running the command below, replace %SEP_VERSION% with the specific version number of Starburst Enterprise you intend to install, such as 429.0.0.

bash
helm show values --version %SEP_VERSION% starburstdata/starburst-enterprise > sep-default.yaml

If there are any problems retrieving this file, check the helm prerequisites guide.

2

Create registry-access.yaml

Create a new file named registry-access.yaml in the starburst-sep directory. This file will contain the credentials needed to access the Docker registry on the Starburst Harbor instance and override the default values. This file can be shared across multiple clsuters.

Change %USERNAME% and %PASSWORD% your Harbor username and password.

yaml
3

Create sep-prod-setup.yaml

Create a new file named sep-prod-setup.yaml in the starburst-sep directory. This file will contain the configuration for the prod cluster and override the default values.

There are various variables in the below YAML that you will need to change. They are:

VariableNotesExample
%ENVIRONMENT_NAME%Sets the environment name that will be used to identify the cluster.sep-prod
%SHARED_SECRET%Sets the shared secret value for internal communications. This can be any string value and is not a specific kubernetes secret.AN0Qhhw9PsZmEgEXAMPLE
%GB_AMOUNT%Sets the amount of memory to allocate to the coordinator and workers.10Gi
%CPU_AMOUNT%Sets the amount of CPU to allocate to the coordinator and workers.2

Be sure to change the memory and CPU variables in both the Coordinator and Worker sections.

yaml
environment: %ENVIRONMENT_NAME%
sharedSecret: %SHARED_SECRET%

coordinator:
  resources:
    memory: "%GB_AMOUNT%" 
    requests:
      cpu: %CPU_AMOUNT%
  
  # This affinity will ensure the coordinator and worker are deployed in the sep node group.
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: apps
            operator: In
            values: 
            - sep

worker:
  resources:
    memory: "%GB_AMOUNT%" 
    requests:
      cpu: %CPU_AMOUNT%
  
  # This affinity will ensure the coordinator and worker are deployed in the sep node group.
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: apps
            operator: In
4

Create sep-prod-catalogs.yaml

Create a new file named sep-prod-catalogs.yaml in the starburst-sep directory. This file will contain the catalog configuration for the prod cluster and override the default values. This can be shared across multiple clusters.

Throughout this guide, you will add more data sources to this file. For now, it will contain the default configuration for the tpch catalog.

yaml
catalogs: 
  tpch: |
    connector.name=tpch

Conclusion

At this point you should have three files created in the starburst-sep directory. These files will overwrite the default configuration values and customize to your integration. The directory should look like this:

- starburst-sep
--- registry-access.yaml
--- sep-prod-setup.yaml
--- sep-prod-catalogs.yaml

In the next guide, you will add your Starburst license to your configuration.