Deep storage configuration

HDFS

Druid can use HDFS as a backend for deep storage:

spec:
  clusterConfig:
    deepStorage:
      hdfs:
        configMapName: simple-hdfs (1)
        directory: /druid (2)
...
1 Name of the HDFS cluster discovery config map. Can be supplied manually for a cluster not provided by Stackable. Needs to contain the core-site.xml and hdfs-site.xml.
2 The directory where to store the druid data.

S3

Druid can use S3 as a backend for deep storage:

spec:
  clusterConfig:
    deepStorage:
      s3:
        bucket:
          inline:
            bucketName: my-bucket  (1)
            connection:
              inline:
                host: test-minio  (2)
                port: 9000  (3)
                credentials:  (4)
                ...
1 Bucket name.
2 Bucket host.
3 Optional bucket port.
4 Credentials explained below.

It is also possible to configure the bucket connection details as a separate Kubernetes resource and only refer to that object from the DruidCluster like this:

spec:
  clusterConfig:
    deepStorage:
      s3:
        bucket:
          reference: my-bucket-resource (1)
1 Name of the bucket resource with connection details.

The resource named my-bucket-resource is then defined as shown below:

---
apiVersion: s3.stackable.tech/v1alpha1
kind: S3Bucket
metadata:
  name: my-bucket-resource
spec:
  bucketName: my-bucket-name
  connection:
    inline:
      host: test-minio
      port: 9000
      credentials:
        ... (explained below)

This has the advantage that bucket configuration can be shared across DruidClusters (and other stackable CRDs) and reduces the cost of updating these details.

You can specify just a connection/bucket for either ingestion or deep storage or for both, but Druid only supports a single S3 connection under the hood. If two connections are specified, they must be the same. This is easiest if a dedicated S3 Connection Resource is used - not defined inline but as a dedicated object.

TLS for S3 is not yet supported.

S3 Credentials

No matter if a connection is specified inline or as a separate object, the credentials are always specified in the same way. You will need a Secret containing the access key ID and secret access key, a SecretClass and then a reference to this SecretClass where you want to specify the credentials.

The Secret:

apiVersion: v1
kind: Secret
metadata:
  name: s3-credentials
  labels:
    secrets.stackable.tech/class: s3-credentials-class  (1)
stringData:
  accessKey: YOUR_VALID_ACCESS_KEY_ID_HERE
  secretKey: YOUR_SECRET_ACCES_KEY_THATBELONGS_TO_THE_KEY_ID_HERE
1 This label connects the Secret to the SecretClass.

The SecretClass:

apiVersion: secrets.stackable.tech/v1alpha1
kind: SecretClass
metadata:
  name: s3-credentials-class
spec:
  backend:
    k8sSearch:
      searchNamespace:
        pod: {}

Referencing it:

...
credentials:
  secretClass: s3-credentials-class
...