Connecting Apache Druid clusters

The operator can automatically connect Superset clusters that it manages to Apache Druid clusters managed by the Stackable operator for Apache Druid.

To do so, create a DruidConnection resource:

apiVersion: superset.stackable.tech/v1alpha1
kind: DruidConnection
metadata:
  name: superset-druid-connection
spec:
  superset:  (1)
    name: superset
    namespace: default
  druid:  (2)
    name: druid
    namespace: default

1	The `name` and `namespace` in `spec.superset` refer to the Superset cluster that you want to connect. Following our example above, the name is `superset`.
2	In `spec.druid` you specify the `name` and `namespace` of your Druid cluster. Following our example above, the name is `druid`.

The namespace part is optional in both cases; if it is omitted it will default to the namespace of the DruidConnection.

Once Superset startup is complete and the database is initialized, the Superset operator will create a Job that will connect to the Superset cluster to run an import command to add the Druid cluster as a datasource.

The Job is connecting to the Superset Pods. If you are restricting network traffic in your Kubernetes cluster, make sure to configure a NetworkPolicy that allows the Job to connect to Superset.

Once the Job is completed you can see the Druid cluster as a database in the user interface under Data > Databases:

Superset databases showing the connected Druid cluster

Connecting Apache Druid clusters

Further reading