Connecting Apache Druid clusters

The operator can automatically connect Superset clusters that it manages to Apache Druid clusters managed by the Stackable operator for Apache Druid.

To do so, create a DruidConnection resource:

apiVersion: superset.stackable.tech/v1alpha1
kind: DruidConnection
metadata:
  name: superset-druid-connection
spec:
  superset:  (1)
    name: superset
    namespace: default
  druid:  (2)
    name: druid
    namespace: default
1 The name and namespace in spec.superset refer to the Superset cluster that you want to connect. Following our example above, the name is superset.
2 In spec.druid you specify the name and namespace of your Druid cluster. Following our example above, the name is druid.

The namespace part is optional in both cases; if it is omitted it will default to the namespace of the DruidConnection.

Once Superset startup is complete and the database is initialized, the Superset operator will create a Job that will connect to the Superset cluster to run an import command to add the Druid cluster as a datasource.

The Job is connecting to the Superset Pods. If you are restricting network traffic in your Kubernetes cluster, make sure to configure a NetworkPolicy that allows the Job to connect to Superset.

Once the Job is completed you can see the Druid cluster as a database in the user interface under Data > Databases:

Superset databases showing the connected Druid cluster

Further reading

Read the CRD reference for the DruidConnection CustomResource.