Building SEP YAML
Setting the foundation for your SEP installation
In this guide, you will retrieve the main YAML file needed to configure SEP and alter values matching your kubernetes environment.
If you have not completed the steps in the helm charts guide, do so before starting this guide.
How does the config YAML file work
A YAML file is a versatile format utilized for configuring software applications and facilitating data sharing across different programming environments. Its design emphasizes simplicity and readability, making it an ideal choice for developers to define structured data easily.
Each helm chart provides a values.yaml
file but it is strongly recommended that it is not altered. Instead, we will create separate files that will override the configuration files instead.
Recommended file set
The file set described below describes a series of focused configuration files. If you have more than one cluster, such as a test cluster and a production cluster, name the files accordingly before you begin. Examples are provided in the sections that follow.
File Name | Content |
---|---|
registry-access.yaml | Docker registry access credentials file, typically to access the Docker registry on the Starburst Harbor instance. |
sep-prod-catalogs.yaml | Catalog configuration for all catalogs configured for SEP on the prod cluster. It is typically useful to separate catalog configurations out into a separate file to allow reuse across clusters |
sep-prod-setup.yaml | Main configuration file for the prod cluster. Include any configuration for all other top level nodes that configure the coordinator, workers, and all other aspects of the cluster. |
Version Control
It is recommended to implement a version control system such as git to keep track of all changes made within configuration files.
Initialize Starburst Setup
Begin by creating a dedicated directory on your machine, named starburst-sep
. This directory will serve as the central hub for executing Helm commands and managing all necessary files for installing, updating, and operating Starburst.
mkdir starburst-sep
Navigate to the starburst-sep
directory. Here, you'll execute a Helm command to fetch the default helm chart for Starburst.
cd starburst-sep
The retrieved file should be named starburst-default.yaml
. It's important to keep this file in its original form, as it provides a comprehensive view of all default settings. You'll be creating additional, more concise files to modify specific values in this YAML file later.
When running the command below, replace %SEP_VERSION%
with the specific version number of Starburst Enterprise you intend to install, such as 429.0.0.
helm show values --version %SEP_VERSION% starburstdata/starburst-enterprise > sep-default.yaml
If there are any problems retrieving this file, check the helm prerequisites guide.
Create registry-access.yaml
Create a new file named registry-access.yaml
in the starburst-sep
directory. This file will contain the credentials needed to access the Docker registry on the Starburst Harbor instance and override the default values. This file can be shared across multiple clsuters.
Change %USERNAME%
and %PASSWORD%
your Harbor username and password.
Create sep-prod-setup.yaml
Create a new file named sep-prod-setup.yaml
in the starburst-sep
directory. This file will contain the configuration for the prod cluster and override the default values.
There are various variables in the below YAML that you will need to change. They are:
Variable | Notes | Example |
---|---|---|
%ENVIRONMENT_NAME% | Sets the environment name that will be used to identify the cluster. | sep-prod |
%SHARED_SECRET% | Sets the shared secret value for internal communications. This can be any string value and is not a specific kubernetes secret. | AN0Qhhw9PsZmEgEXAMPLE |
%GB_AMOUNT% | Sets the amount of memory to allocate to the coordinator and workers. | 10Gi |
%CPU_AMOUNT% | Sets the amount of CPU to allocate to the coordinator and workers. | 2 |
Be sure to change the memory and CPU variables in both the Coordinator and Worker sections.
environment: %ENVIRONMENT_NAME% sharedSecret: %SHARED_SECRET% coordinator: resources: memory: "%GB_AMOUNT%" requests: cpu: %CPU_AMOUNT% # This affinity will ensure the coordinator and worker are deployed in the sep node group. affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: apps operator: In values: - sep worker: resources: memory: "%GB_AMOUNT%" requests: cpu: %CPU_AMOUNT% # This affinity will ensure the coordinator and worker are deployed in the sep node group. affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: apps operator: In
Create sep-prod-catalogs.yaml
Create a new file named sep-prod-catalogs.yaml
in the starburst-sep
directory. This file will contain the catalog configuration for the prod cluster and override the default values. This can be shared across multiple clusters.
Throughout this guide, you will add more data sources to this file. For now, it will contain the default configuration for the tpch
catalog.
catalogs: tpch: | connector.name=tpch
Conclusion
At this point you should have three files created in the starburst-sep
directory. These files will overwrite the default configuration values and customize to your integration. The directory should look like this:
- starburst-sep --- registry-access.yaml --- sep-prod-setup.yaml --- sep-prod-catalogs.yaml
In the next guide, you will add your Starburst license to your configuration.