Service exposition

Data products expose interfaces to the outside world. These interfaces (whether UIs, or APIs) can be accessed by other products or by end users. Other products accessing the interfaces can run inside or outside of the same Kubernetes cluster. For example, Apache ZooKeeper is a dependency for other products, and it usually needs to be accessible only from within Kubernetes, while Apache Superset is a data analysis product for end users and therefore needs to be accessible from outside the Kubernetes cluster. Users connecting to Superset can be restricted within the local company network, or they can connect over the internet depending on the company security policies and demands. This page gives an overview over the different options for service exposition, when to choose which option and how these options are configured.

Service exposition options

The Stackable Data Platform supports three types of Kubernetes Service for exposing data product endpoints:

  • ClusterIP

  • NodePort

  • LoadBalancer

All custom resources for data products provide a resource field named spec.clusterConfig.listenerClass which determines how the product can be accessed. There are three ListenerClasses, named after the goal for which they are used (more on this in the next section):

  • cluster-internal ⇒ Use ClusterIP (default)

  • external-unstable ⇒ Use NodePort

  • external-stable ⇒ Use LoadBalancer

The cluster-internal class exposes the interface of a product by using a ClusterIP Service. This service is only reachable from within the Kubernetes cluster. This setting is the most secure and was chosen as the default for that reason.

Not all operators support all classes. Consult the operator specific documentation to find out about the supported service types.

When to choose which option

There are three options, one for internal traffic and two for external access, where internal and external refer to the Kubernetes cluster. Internal means inside of the Kuberenetes cluster, and external means access from outside of it.

Internal

cluster-internal is the default class and the Service behind it is only reachable from within Kubernetes. This is useful for middleware products such as Apache ZooKeeper, Apache Hive metastore, or an Apache Kafka cluster used for internal data flow. Products using this ListenerClass are not accessible from outside Kubernetes.

External

External access is needed when a product needs to be accessed from outside of Kubernetes. This is necessary for all end user products such as Apache Superset. Some tools can expose APIs for data ingestion like Apache Kafka or Apache NiFi. If data needs to be ingested from outside of the cluster, one of the external listener classes should be chosen.

When to use stable and when to use unstable? The external-unstable setting exposes a product interface via a Kuberneres NodePort. In this case the service’s IP address and port can change if Kubernetes needs to restart or reschedule the Pod to another node.

The external-stable class uses a LoadBalancer. The LoadBalancer is running at a fixed address and is therefore stable. Managed Kubernetes services in the cloud usually offer a LoadBalancer, but for an on premise cluster you have to configure a LoadBalancer yourself. For a production setup, it is recommended to use a LoadBalancer and the external-stable ListenerClass.

Outlook

These listener classes are hardcoded to expose certain Service types and do not offer any additional configuration. In a future release, the ListenerClass provided by the listener-operator will allow you to create your own listener class variants, with more granual configuration options.