Service exposition

All data products expose an interface to the outside world. This interface can be accessed by other products or by end users. Other products accessing the interface can run inside or outside the same Kubernetes cluster. For example, Apache ZooKeeper is a dependency for other products, and it usually needs to be accessible only from within Kubernetes, while Superset is a data analysis product for end users and therefore needs to be accessible from outside the Kubernetes cluster. Users connecting to Superset can be restricted within the local company network, or they can connect over the internet depending on the company security policies and demands. This page gives an overview over the different options for service exposition, when to chose which option and how these options are configured.

Service exposition options

The service offered by a data product is the utility it is used for, but Service also means the Kubernetes resource. The Stackable Data Platform supports three types of Service:

  • ClusterIP

  • NodePort

  • LoadBalancer

All custom resources for data products provide a resource field named spec.clusterConfig.listenerClass which determines how the product can be accessed . There are three ListenerClasses, named after the goal for which they are used (more on this in the next section):

  • cluster-internal ⇒ Use ClusterIP (default)

  • external-unstable ⇒ Use NodePort

  • external-stable ⇒ Use LoadBalancer

The cluster-internal class exposes the interface of a product by using a ClusterIP Service. This service is only reachable from within the Kubernetes cluster. This setting is the most secure and was chosen as the default for that reason.

Not all Operators support all classes. Consult the Operator specific documentation to find out about the supported service types.

When to choose which option

There are three options, one for internal traffic and two for external access, where internal and external refer to the Kubernetes cluster. Internal means inside of the Kuberenetes cluster, and external means access from outside of it.

Internal

cluster-internal is the default class and, the Service behind it is only exposed within Kubernetes. This is useful for middleware products such as Apache ZooKeeper, the Apache Hive metastore or a Apache Kafka cluster used for internal data flow. Products using this ListenerClass are not accessible from outside Kubernetes.

External

External access is needed when a product needs to be accessed from outside of Kubernetes. This is necessary for all end user products such as Apache Superset. Some tools can expose APIs for data ingestion like Apache Kafka or Apache NiFi. If data needs to be ingested from outside of the cluster, one of the external listener classes should be chosen.

When to use stable and when to use unstable? The external-unstable setting exposes a product interface via a Kuberneres NodePort. In this case the service’s IP address and port can change if Kubernetes needs to restart or reschedule the Pod to another node.

The external-stable class uses a LoadBalancer. The LoadBalancer is running at a fixed address and is therefore stable. Managed Kubernetes services in the cloud usually offer a LoadBalancer, but for an on premise cluster you have to configure a LoadBalancer yourself. For a production setup, it is recommended to use a LoadBalancer or external-stable ListenerClass.

Outlook

These listener classes are hardcoded to expose certain Service types and do not offer any additional configuration. In a future release, the ListenerClass provided by the listener-operator will allow you to create your own listener class variants, with more granual configuration options.