Getting started with kubernetes
Starburst Enterprise with Kubernetes
Welcome to the guide on using Starburst with Kubernetes. This guide aims to provide you with a comprehensive understanding of how to leverage Starburst in a Kubernetes environment. Whether you are new to Starburst, Kubernetes, or both, this guide will walk you through the essential concepts and steps to get you started.
What is Starburst?
Starburst is an analytics engine based on the open-source Trino (formerly known as PrestoSQL) distributed SQL query engine. It is designed for fast analytics on large datasets distributed across multiple sources. Starburst enhances the capabilities of Trino with added features such as security, connectivity, and performance optimizations, making it an ideal choice for enterprises looking to execute fast and efficient analytics at scale.
Trino's Architecture: Coordinators and Workers
Trino operates on a coordinator-worker architecture:
- Coordinator Node: The coordinator node is responsible for parsing, analyzing, and planning queries. It manages the execution of these queries by distributing the workload among the worker nodes.
- Worker Nodes: Worker nodes execute tasks and process data. They perform the actual computation and data processing as instructed by the coordinator.
This architecture allows Trino to process large volumes of data by parallelizing the workload across multiple nodes.
Data Connectivity, Not Data Storage
Importantly, Starburst provides the means to connect to various data sources but does not store the data itself. It acts as a powerful bridge between client tools and the actual data, enabling users to perform analytics on data located in diverse sources.
- Data Source Integration: Starburst integrates seamlessly with numerous data sources, allowing users to query data where it lives without the need for data movement or replication.
- Client Tool Access: Client tools can connect directly to the Starburst cluster, enabling users to access and analyze their data using their preferred tools.
How Starburst Works with Kubernetes
Kubernetes is a perfect fit for managing Trino's distributed architecture:
- Dynamic Scaling of Worker Nodes: Kubernetes scales worker nodes based on workload, optimizing resource utilization and performance.
- Resilience and Fault Tolerance: Kubernetes ensures minimal downtime by automatically restarting failed worker nodes.
- Simplified Deployment and Management: Deploying coordinator and worker nodes as separate services in Kubernetes simplifies management and scaling.
What's to Come in This Guide
We will cover various aspects of setting up and using Starburst with Kubernetes:
- Prerequisites: Infrastructure, networking, and Helm charts necessary for Starburst deployment.
- Setting Up Starburst: Guide to deploying Starburst in Kubernetes, focusing on coordinator and worker nodes.
- User Authentication: Implementing secure access in Starburst.
- Setting Up Catalogs: Configuring catalogs for seamless data source querying.
- Security: Ensuring data safety and compliance in your Starburst deployment.
Stay tuned as we delve into each topic, equipping you with the knowledge and tools for effective Starburst usage in a Kubernetes environment.