Service exposition
Data products expose interfaces to the outside world. These interfaces (whether UIs, or APIs) can be accessed by other products or by end users. Other products accessing the interfaces can run inside or outside of the same Kubernetes cluster. For example, Apache ZooKeeper is a dependency for other products, and it usually needs to be accessible only from within Kubernetes, while Apache Superset is a data analysis product for end users and therefore needs to be accessible from outside the Kubernetes cluster. Users connecting to Superset can be restricted within the local company network, or they can connect over the internet depending on the company security policies and demands. This page gives an overview over the different options for service exposition, when to choose which option and how these options are configured.
Service exposition options
The Stackable Data Platform supports three types of Kubernetes Service for exposing data product endpoints:
-
ClusterIP
-
NodePort
-
LoadBalancer
All custom resources for data products provide a resource field named spec.clusterConfig.listenerClass
which determines how the product can be accessed.
There are three ListenerClasses, named after the goal for which they are used (more on this in the next section):
-
cluster-internal
⇒ Use ClusterIP (default) -
external-unstable
⇒ Use NodePort -
external-stable
⇒ Use LoadBalancer
The cluster-internal
class exposes the interface of a product by using a ClusterIP Service.
This service is only reachable from within the Kubernetes cluster.
This setting is the most secure and was chosen as the default for that reason.
Not all operators support all classes. Consult the operator specific documentation to find out about the supported service types. |
When to choose which option
There are three options, one for internal traffic and two for external access, where internal and external refer to the Kubernetes cluster. Internal means inside of the Kuberenetes cluster, and external means access from outside of it.
Internal
cluster-internal
is the default class and the Service behind it is only reachable from within Kubernetes.
This is useful for middleware products such as Apache ZooKeeper, Apache Hive metastore, or an Apache Kafka cluster used for internal data flow.
Products using this ListenerClass are not accessible from outside Kubernetes.
External
External access is needed when a product needs to be accessed from outside of Kubernetes. This is necessary for all end user products such as Apache Superset. Some tools can expose APIs for data ingestion like Apache Kafka or Apache NiFi. If data needs to be ingested from outside of the cluster, one of the external listener classes should be chosen.
When to use stable
and when to use unstable
?
The external-unstable
setting exposes a product interface via a Kuberneres NodePort.
In this case the service’s IP address and port can change if Kubernetes needs to restart or reschedule the Pod to another node.
The external-stable
class uses a LoadBalancer.
The LoadBalancer is running at a fixed address and is therefore stable
.
Managed Kubernetes services in the cloud usually offer a LoadBalancer, but for an on premise cluster you have to configure a LoadBalancer yourself.
For a production setup, it is recommended to use a LoadBalancer and the external-stable
ListenerClass.
Outlook
For most of the Stackable operators, these listener classes are hardcoded to expose certain Service types and do not offer any additional configuration. However, some operators support specifying custom ListenerClasses with more granular configuration options, via the listener-operator. In a future release, all Stackable operators are planned to be migrated over to this system.
For more information on what is supported by any individual operator, please see that operator’s documentation.