Ingestion

From S3

To ingest data from s3 you need to specify a host to connect to, but there are also other settings that can be used:

spec:
  clusterConfig:
    ingestion:
      s3connection:
        host: yourhost.com  (1)
        port: 80 # optional (2)
        credentials: # optional (3)
        ...

1	The S3 host, not optional
2	Port, optional, defaults to 80
3	Credentials to use. Since these might be bucket-dependent, they can instead be given in the ingestion job. Specifying the credentials here is explained below.

You can specify just a connection/bucket for either ingestion or deep storage or for both, but Druid only supports a single S3 connection under the hood. If two connections are specified, they must be the same. This is easiest if a dedicated S3 Connection Resource is used - not defined inline but as a dedicated object.

The S3Connection region field is ignored because Druid uses the AWS SDK v1, which ignores the region if the endpoint is set. The host is a required field, therefore the endpoint will always be set.

TLS for S3 is not yet supported.

S3 credentials

No matter if a connection is specified inline or as a separate object, the credentials are always specified in the same way. You need a Secret containing the access key ID and secret access key, a SecretClass and then a reference to this SecretClass where you want to specify the credentials.

The Secret:

apiVersion: v1
kind: Secret
metadata:
  name: s3-credentials
  labels:
    secrets.stackable.tech/class: s3-credentials-class  (1)
stringData:
  accessKey: YOUR_VALID_ACCESS_KEY_ID_HERE
  secretKey: YOUR_SECRET_ACCES_KEY_THATBELONGS_TO_THE_KEY_ID_HERE

1	This label connects the `Secret` to the `SecretClass`.

The SecretClass:

apiVersion: secrets.stackable.tech/v1alpha1
kind: SecretClass
metadata:
  name: s3-credentials-class
spec:
  backend:
    k8sSearch:
      searchNamespace:
        pod: {}

Referencing it:

...
credentials:
  secretClass: s3-credentials-class
...

Adding external files, e.g. for ingestion

Since Druid actively runs ingestion tasks there may be a need to make extra files available to the processes.

These could for example be client certificates used to connect to a Kafka cluster or a keytab to obtain a Kerberos ticket.

In order to make these files available the operator allows specifying extra volumes that are added to all pods deployed for this cluster.

spec:
  clusterConfig:
    extraVolumes:
      - name: google-service-account
        secret:
          secretName: google-service-account

All Volumes specified in this section are made available under /stackable/userdata/{volumename}.