Ingestion
From S3
To ingest data from s3 you need to specify a host to connect to, but there are also other settings that can be used:
spec:
clusterConfig:
ingestion:
s3connection:
host: yourhost.com (1)
port: 80 # optional (2)
credentials: # optional (3)
...
1 | The S3 host, not optional |
2 | Port, optional, defaults to 80 |
3 | Credentials to use. Since these might be bucket-dependent, they can instead be given in the ingestion job. Specifying the credentials here is explained below. |
You can specify just a connection/bucket for either ingestion or deep storage or for both, but Druid only supports a single S3 connection under the hood. If two connections are specified, they must be the same. This is easiest if a dedicated S3 Connection Resource is used - not defined inline but as a dedicated object. TLS for S3 is not yet supported. |
S3 credentials
No matter if a connection is specified inline or as a separate object, the credentials are always specified in the same way. You need a Secret containing the access key ID and secret access key, a SecretClass and then a reference to this SecretClass where you want to specify the credentials.
The Secret:
apiVersion: v1
kind: Secret
metadata:
name: s3-credentials
labels:
secrets.stackable.tech/class: s3-credentials-class (1)
stringData:
accessKey: YOUR_VALID_ACCESS_KEY_ID_HERE
secretKey: YOUR_SECRET_ACCES_KEY_THATBELONGS_TO_THE_KEY_ID_HERE
1 | This label connects the Secret to the SecretClass . |
The SecretClass
:
apiVersion: secrets.stackable.tech/v1alpha1
kind: SecretClass
metadata:
name: s3-credentials-class
spec:
backend:
k8sSearch:
searchNamespace:
pod: {}
Referencing it:
...
credentials:
secretClass: s3-credentials-class
...
Adding external files, e.g. for ingestion
Since Druid actively runs ingestion tasks there may be a need to make extra files available to the processes.
These could for example be client certificates used to connect to a Kafka cluster or a keytab to obtain a Kerberos ticket.
In order to make these files available the operator allows specifying extra volumes that are added to all pods deployed for this cluster.
spec:
clusterConfig:
extraVolumes:
- name: google-service-account
secret:
secretName: google-service-account
All Volumes specified in this section are made available under /stackable/userdata/{volumename}
.