ADR020: Trino catalog usage
-
Status: accepted
-
Deciders:
-
Felix Hennig
-
Malte Sander
-
Sebastian Bernauer
-
Sönke Liebau
-
Natalie Klestrup Röijezon
-
-
Date: 17.05.2022
Context and Problem Statement
Trino allows user to specify multiple catalogs to connect to a variety of different data-sources. We need to agree on a mechanism to
-
Specifying Trino catalog definitions (ADR019: Trino catalog definitions)
-
Connect a catalog definition to an Trino cluster (this ADR)
Decision Drivers
-
Catalogs must somehow be added to Trino clusters
-
Catalogs should be reusable between multiple Trino clusters - e.g. users have two identical Trino clusters - one for adhoc queries, on for scheduled jobs.
Considered Options
-
Catalog references Clusters
-
Cluster references Catalogs
-
Mapping object between Catalog and Cluster
-
Catalog references Cluster
-
Mapping via labels and label selectors
Decision Outcome
Chosen option: "Mapping via labels and label selectors", because it’s the most flexible solution and delegates implementation details to kubernetes.
Pros and Cons of the Options
Catalog references Clusters
-
Good, because if Trino instances in different stages have different catalogs, a Trino Object can be reused over the different stages
-
Bad, because if a Trino Cluster has multiple catalogs and you want a similar Trino Cluster you need to modify all catalogs (and maybe restart your first Trino multiple times), not just simply create a new Trino Cluster copy
Cluster references Catalogs
-
Good, because it’s the normal flow that our Product CRDs point to other objects, not the other direction
-
Bad, because people adding new catalogs need to be able to modify the
TrinoCluster
object. There may be companies out there where different people operate Trino and manage the catalogs -
Bad, because if Trino instances in different stages have different catalogs a Trino Object can not be reused over the different stages
Mapping object between Catalog and Cluster
-
Bad, because more complicated for the users
-
Bad, because more complicated watches needed
Catalog references Cluster
This is the same as Catalog references Clusters but instead of a list of Clusters the Catalog only contains a single Cluster.
-
Good, because if Trino instances in different stages have different catalogs a Trino Object can be reused over the different stages
-
Good compared to Catalog references Clusters, because an
TrinoCatalog
is associated with anTrinoCluster
and the cluster can add additional information to it like the current deployment status -
Bad, because catalogs can not be reused between multiple Trino clusters
Mapping via labels and label selectors
This is option Mapping object between Catalog and Cluster the "kubernetes way".
Every TrinoCatalog
object provides a set of labels.
The TrinoCluster
objects provide a LabelSelector which defines which catalogs should be included in the Trino instance.
-
Good, because flexible (Trino cluster administrators can add new catalogs and teams can add their needed catalogs to an managed Trino instance)
-
Good, because the usage of well-known kubernetes patterns
-
Good, because delegates implementation details to kubernetes (e.g. easier watches)