Release notes for the Stackable Data Platform

The Stackable platform consists of multiple operators that work together. Periodically a platform release is made, including all components of the platform at a specific version.

Release 23.7

This release introduces the specification of resource quotas and pod overrides and updates the product versions supported by SDP.

New / extended platform features

The following new major platform features were added:

Resource Quotas

Explicit resources are now applied to all containers, for both operators and products. This allows running the Stackable Data Platform on Kubernetes clusters with a ResourceQuota or LimitRange set. Where these are not specified directly, defaults will be used. See this issue for more information.

Pod Overrides

It is now possible to add custom settings which specify elements of a pod template (Service, StatefulSet etc.) on roles or rolegroups, which the operator then merges with the objects it writes before actually applying them. This provides the user with a possibility for specifying any property that can be set on a regular Kubernetes Pod, but which is not directly exposed via the Stackable custom resource definition. Have a look at the documentation for more details.

For example, with HDFS:

    roleGroups:
      default:
        replicas: 1
        podOverrides:
          spec:
            containers:
              - name: journalnode
                resources:
                  requests:
                    cpu: 110m
                  limits:
                    cpu: 410m
Openshift certification

OLM bundles - a pre-requisite for the Openshift certification process - have been created for each operator. All 15 SDP operators in release 23.4.1 are now Openshift-certified and deployable directly from within an Openshift cluster.

Signed SDP operator images

As of this release all Stackable operator images are signed (this feature will be added to product images in a subsequent release). More information about this, including how to verify the image signatures, can be found in this tutorial.

New Versions

The following new product versions are now supported:

Deprecated Versions

The following product versions are deprecated and will be removed in a later release:

  • Airflow: 2.2.3, 2.2.4, 2.2.5, 2.4.1

  • Druid: 0.23.0, 24.0.0

  • HBase: 2.4.6, 2.4.8, 2.4.9, 2.4.11

  • HDFS: 3.2.2, 3.3.1, 3.3.3

  • Hive: 2.3.9

  • Kafka: 2.7.1, 2.8.1, 3.1.0, 3.2.0, 3.3.1

  • Nifi: 1.15.0, 1.15.1, 1.15.2, 1.15.3, 1.16.0, 1.16.1, 1.16.2, 1.16.3, 1.18.0

  • Opa: 0.27.1, 0.28.0, 0.37.2, 0.41.0, 0.45.0

  • Spark: 3.2.1, 3.3.0

  • Superset: 1.3.2, 1.4.1, 1.5.1

  • Trino: 377, 387, 395, 396, 403

  • Zookeeper: 3.5.8, 3.6.3, 3.7.0, 3.8.0

Removed Versions

No product versions have been removed.

Product features

Additionally, there are some individual product features that are noteworthy:

stackablectl

There are no new demos in this platform release.

Supported Kubernetes versions

This release supports the following Kubernetes versions:

  • 1.26

  • 1.25

  • 1.24

This Kubernetes version is no longer supported:

  • 1.23

Supported OpenShift versions

This release supports the following OpenShift versions:

  • 4.11

  • 4.10

Breaking changes

The re-structuring of configuration definitions in certain operators will require you to adapt your existing CRDs as shown below.

Stackable Operator for Apache Airflow

CRDs should be changed from e.g.

spec:
  ...
  executor: CeleryExecutor
  loadExamples: true
  exposeConfig: false
  credentialsSecret: test-airflow-credentials
  ...

to:

spec:
  ...
  clusterConfig:
    executor: CeleryExecutor
    loadExamples: true
    exposeConfig: false
    credentialsSecret: test-airflow-credentials
    ...

Stackable Operator for Apache Superset

CRDs should be changed from e.g.

spec:
  ...
  credentialsSecret: superset-credentials
  loadExamplesOnInit: false
  vectorAggregatorConfigMapName: vector-aggregator-discovery
  ...

to:

spec:
  ...
  clusterConfig:
    credentialsSecret: superset-credentials
    loadExamplesOnInit: false
    vectorAggregatorConfigMapName: vector-aggregator-discovery
    ...

Stackable Operator for Trino

CRDs should be changed from e.g.

spec:
  ...
  clusterConfig:
    authentication:
      method:
        multiUser:
          userCredentialsSecret:
            name: trino-users
  ...

referencing a Secret with bcrypt-ed data:

---
apiVersion: v1
kind: Secret
metadata:
  name: trino-users
type: kubernetes.io/opaque
stringData:
  # admin:admin
  admin: $2y$10$89xReovvDLacVzRGpjOyAOONnayOgDAyIS2nW9bs5DJT98q17Dy5i
  # alice:alice
  alice: $2y$10$HcCa4k9v2DRrD/g7e5vEz.Bk.1xg00YTEHOZjPX7oK3KqMSt2xT8W
  # bob:bob
  bob: $2y$10$xVRXtYZnYuQu66SmruijPO8WHFM/UK5QPHTr.Nzf4JMcZSqt3W.2.

to:

spec:
  ...
  clusterConfig:
    authentication:
      - authenticationClass: trino-users-auth
    ...

referencing an AuthenticationClass (which references a Secret with plain data):

---
apiVersion: authentication.stackable.tech/v1alpha1
kind: AuthenticationClass
metadata:
    name: trino-users-auth
spec:
  provider:
    static:
      userCredentialsSecret:
        name: trino-users
---
apiVersion: v1
kind: Secret
metadata:
  name: trino-users
type: kubernetes.io/opaque
stringData:
  admin: admin
  alice: alice
  bob: bob

Upgrade from 23.4

Using stackablectl

To uninstall the 23.4 release run

$ stackablectl release uninstall 23.4
[INFO ] Uninstalling release 23.4
[INFO ] Uninstalling airflow operator
[INFO ] Uninstalling commons operator
# ...

Afterwards you will need to upgrade the CustomResourceDefinitions (CRDs) installed by the Stackable Platform. The reason for this is that helm will uninstall the operators but not the CRDs. This can be done using kubectl replace:

kubectl replace -f https://raw.githubusercontent.com/stackabletech/airflow-operator/23.7.0/deploy/helm/airflow-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/commons-operator/23.7.0/deploy/helm/commons-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/druid-operator/23.7.0/deploy/helm/druid-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/hbase-operator/23.7.0/deploy/helm/hbase-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/hdfs-operator/23.7.0/deploy/helm/hdfs-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/hive-operator/23.7.0/deploy/helm/hive-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/kafka-operator/23.7.0/deploy/helm/kafka-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/listener-operator/23.7.0/deploy/helm/listener-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/nifi-operator/23.7.0/deploy/helm/nifi-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/opa-operator/23.7.0/deploy/helm/opa-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/secret-operator/23.7.0/deploy/helm/secret-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/spark-k8s-operator/23.7.0/deploy/helm/spark-k8s-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/superset-operator/23.7.0/deploy/helm/superset-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/trino-operator/23.7.0/deploy/helm/trino-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/zookeeper-operator/23.7.0/deploy/helm/zookeeper-operator/crds/crds.yaml
customresourcedefinition.apiextensions.k8s.io "airflowclusters.airflow.stackable.tech" replaced
customresourcedefinition.apiextensions.k8s.io "airflowdbs.airflow.stackable.tech" replaced
customresourcedefinition.apiextensions.k8s.io "authenticationclasses.authentication.stackable.tech" replaced
customresourcedefinition.apiextensions.k8s.io "s3connections.s3.stackable.tech" replaced
...

To install the 23.7 release run

$ stackablectl release install 23.7
[INFO ] Installing release 23.7
[INFO ] Installing airflow operator in version 23.7.0
[INFO ] Installing commons operator in version 23.7.0
[INFO ] Installing druid operator in version 23.7.0
[INFO ] Installing hbase operator in version 23.7.0
[INFO ] Installing hdfs operator in version 23.7.0
[INFO ] Installing hive operator in version 23.7.0
[INFO ] Installing kafka operator in version 23.7.0
[INFO ] Installing listener operator in version 23.7.0
[INFO ] Installing nifi operator in version 23.7.0
[INFO ] Installing opa operator in version 23.7.0
[INFO ] Installing secret operator in version 23.7.0
[INFO ] Installing spark-k8s operator in version 23.7.0
[INFO ] Installing superset operator in version 23.7.0
[INFO ] Installing trino operator in version 23.7.0
[INFO ] Installing zookeeper operator in version 23.7.0

Using helm

Use helm list to list the currently installed operators.

You can use the following command to uninstall all operators that are part of the 23.4 release:

$ helm uninstall airflow-operator commons-operator druid-operator hbase-operator hdfs-operator hive-operator kafka-operator listener-operator nifi-operator opa-operator secret-operator spark-k8s-operator superset-operator trino-operator zookeeper-operator
release "airflow-operator" uninstalled
release "commons-operator" uninstalled
# ...

Afterwards you will need to upgrade the CustomResourceDefinitions (CRDs) installed by the Stackable Platform. The reason for this is that helm will uninstall the operators but not the CRDs. This can be done using kubectl replace:

kubectl replace -f https://raw.githubusercontent.com/stackabletech/airflow-operator/23.7.0/deploy/helm/airflow-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/commons-operator/23.7.0/deploy/helm/commons-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/druid-operator/23.7.0/deploy/helm/druid-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/hbase-operator/23.7.0/deploy/helm/hbase-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/hdfs-operator/23.7.0/deploy/helm/hdfs-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/hive-operator/23.7.0/deploy/helm/hive-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/kafka-operator/23.7.0/deploy/helm/kafka-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/listener-operator/23.7.0/deploy/helm/listener-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/nifi-operator/23.7.0/deploy/helm/nifi-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/opa-operator/23.7.0/deploy/helm/opa-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/secret-operator/23.7.0/deploy/helm/secret-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/spark-k8s-operator/23.7.0/deploy/helm/spark-k8s-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/superset-operator/23.7.0/deploy/helm/superset-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/trino-operator/23.7.0/deploy/helm/trino-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/zookeeper-operator/23.7.0/deploy/helm/zookeeper-operator/crds/crds.yaml

To install the 23.7 release run

helm repo add stackable-stable https://repo.stackable.tech/repository/helm-stable/
helm repo update stackable-stable
helm install --wait airflow-operator stackable-stable/airflow-operator --version 23.7.0
helm install --wait commons-operator stackable-stable/commons-operator --version 23.7.0
helm install --wait druid-operator stackable-stable/druid-operator --version 23.7.0
helm install --wait hbase-operator stackable-stable/hbase-operator --version 23.7.0
helm install --wait hdfs-operator stackable-stable/hdfs-operator --version 23.7.0
helm install --wait hive-operator stackable-stable/hive-operator --version 23.7.0
helm install --wait kafka-operator stackable-stable/kafka-operator --version 23.7.0
helm install --wait listener-operator stackable-stable/listener-operator --version 23.7.0
helm install --wait nifi-operator stackable-stable/nifi-operator --version 23.7.0
helm install --wait opa-operator stackable-stable/opa-operator --version 23.7.0
helm install --wait secret-operator stackable-stable/secret-operator --version 23.7.0
helm install --wait spark-k8s-operator stackable-stable/spark-k8s-operator --version 23.7.0
helm install --wait superset-operator stackable-stable/superset-operator --version 23.7.0
helm install --wait trino-operator stackable-stable/trino-operator --version 23.7.0
helm install --wait zookeeper-operator stackable-stable/zookeeper-operator --version 23.7.0

Known upgrade issues

In the case of the breaking changes detailed above it will be necessary to update the custom resources for Airflow, Superset and Trino clusters and re-apply them.

Additionally, please note the following:

All operators
  • If the default PVC size has been changed, then the StatefulSet must be deleted: it is not possible to change the PVC in the StatefulSet specification.

    • The error message is similar to: StatefulSet.apps "trino-worker-default" is invalid: spec: Forbidden: updates to `StatefulSet spec for fields other than 'replicas', 'template', 'updateStrategy', […​]`

ZooKeeper operator
  • The ZooKeeper operator in this release expects a product image with the same version. An existing ZooKeeper cluster resource should be deleted and re-applied with the corresponding stackableVersion e.g.:

spec:
  image:
    productVersion: 3.8.0
    stackableVersion: "23.7"

Release 23.4

The focus in this platform release is on the support of default/custom affinities and the status field, as well as the rollout of log aggregation across the remaining operators. Additionally, all operators have been updated and tested for compatibility with OpenShift clusters (versions 4.10 and 4.11). Several operators from the 23.1 platform release were already certified against OpenShift.

Release 23.4.0

This was the first release in the 23.4 release line. It is recommended to install Release 23.4.1 instead, as it contains relevant bugfixes.

Release 23.4.1

This is a bugfix/patch-level release that fixes the following issues:

  • Fix missing custom resource defaults that are required for a release update. See here.

  • Specify the security context to run as a member of the root group (this has been implemented for the Stackable operators where it had not previously been implemented i.e. Apache HBase, Apache HDFS, Apache ZooKeeper and Apache Spark on Kubernetes). This is required by Openshift clusters so that the product can be run with a random UID. This is a requirement for at least Airflow, but is Openshift policy as described here and here.

  • Automatically migrate the name used for the bundle-builder container for OPA daemonsets. See here.

  • Automatically shorten the registration socket path used in listener-operator for Microk8s compatibility, migrated during upgrade. See here.

New / extended platform features

The following new major platform features were added:

Cluster Operation

The first part of Cluster operations was rolled out in every applicable Stackable Operator. This supports pausing the cluster reconciliation and stopping the cluster completely. Pausing reconciliation will not apply any changes to the Kubernetes resources (e.g. when changing the custom resource). Stopping the cluster will set all replicas of StatefulSets, Deployments or DaemonSets to zero and therefore deleting all Pods belonging to that cluster (not the PVCs).

Status Field

Operators of the Stackable Data Platform create, manage and delete Kubernetes resources: in order to easily query the health state of the products - and react accordingly - Stackable Operators use several predefined condition types to capture different aspects of a product’s availability. See this ADR for more information.

Default / Custom Affinities

In Kubernetes there are different ways to influence how Pods are assigned to Nodes. In some cases it makes sense to co-locate certain services that communicate a lot with each other, such as HBase regionservers with HDFS datanodes. In other cases it makes sense to distribute the Pods among as many Nodes as possible. There may also be additional requirements e.g. placing important services - such as HDFS namenodes - in different racks, datacenter rooms or even datacenters. This release implements default affinities that should suffice for many scenarios out-of-the box, while also allowing for custom affinity rules at a role and/or role-group level. See this ADR for more information.

Log Aggregation

The logging framework (added to the platform in Release 23.1) offers a consistent custom resource configuration and a separate, persisted sink (defaulting to OpenSearch). This has now been rolled out across all products. See this ADR and this concepts page for more information.

Service Type

The Service type can now be specified in all products. This currently differentiates between the internal ClusterIP and the external NodePort and is forward compatible with the ListenerClass for the automatic exposure of Services via the Listener Operator. This change is not backwards compatible with older platform releases. For security reasons, the default is set to the cluster-internal (ClusterIP) ListenerClass. A cluster can be exposed outside of Kubernetes by setting clusterConfig.listenerClass to external-unstable (NodePort) or external-stable (LoadBalancer).

New Versions

No new product versions are supported in this platform release.

Deprecated Versions

No product versions have been deprecated in this platform release.

Product features

Additionally, there are some individual product features that are noteworthy:

stackablectl

The following have been added to stackablectl:

Trino-iceberg demo

This is a condensed form of the data-lakehouse-iceberg-trino-spark demo focusing on using the lakehouse to store and modify data. It demonstrates how to integrate Trino and Iceberg and should run on a local workstation.

Jupyterhub/Spark demo

This demo showcases the integration between Jupyter and Apache Hadoop deployed on the Stackable Data Platform (SDP) Kubernetes cluster. This demo can be installed on most cloud managed Kubernetes clusters as well as on premise or on a reasonably provisioned laptop.

The quickstart guide shows how to get started with stackablectl. This link lists the available demos.

Supported Kubernetes versions

This release supports the following Kubernetes versions:

  • 1.26

  • 1.25

  • 1.24

  • 1.23 (it is planned to discontinue support for this version in the next release)

Supported OpenShift versions

This release supports the following OpenShift versions:

  • 4.11

  • 4.10

Breaking changes

You will need to adapt your existing CRDs due to the following breaking changes detailed below.

All Stackable Operators

As mentioned above, specifying the service type is a breaking change for all operators. The default value is set to the cluster-internal ListenerClass: if the cluster requires external access outside of Kubernetes then set clusterConfig.listenerClass to external-unstable or external-stable:

spec:
  image:
    productVersion: "396"
    stackableVersion: "23.4.1"
  clusterConfig:
    listenerClass: external-unstable

This is an example for Trino, but the pattern is the same across all operators.

Stackable Operator for Apache Airflow

Existing Airflow clusters need to be deleted and recreated. Airflow metadata held in the database and DAGs saved on disk are not affected.

This is required because the UID of the Airflow user has changed to be in line with the rest of the platform.

Stackable Operator for Apache HBase

CRDs should be changed from e.g.

spec:
  ...
  hdfsConfigMapName: simple-hdfs
  zookeeperConfigMapName: simple-znode

to:

spec:
  ...
  clusterConfig:
    hdfsConfigMapName: simple-hdfs
    zookeeperConfigMapName: simple-znode

Stackable Operator for Apache Hadoop

CRDs should be changed from e.g.

spec:
  ...
  zookeeperConfigMapName: simple-hdfs-znode
  dfsReplication: 3
  vectorAggregatorConfigMapName: vector-aggregator-discovery

to:

spec:
  ...
  clusterConfig:
    zookeeperConfigMapName: simple-hdfs-znode
    dfsReplication: 1
    vectorAggregatorConfigMapName: vector-aggregator-discovery

Stackable Operator for Apache Nifi

CRDs should be changed from e.g.

spec:
  ...
  zookeeperConfigMapName: simple-nifi-znode

to:

spec:
  ...
  clusterConfig:
    zookeeperConfigMapName: simple-nifi-znode

Stackable Operator for Apache Spark-k8s

CRDs should be changed from e.g.

spec:
  ...
  driver:
    nodeSelector:

to:

spec:
  ...
  driver:
    affinity:

Stackable Operator for Apache Trino

CRDs should be changed from e.g.

spec:
  ...
  opa:
    configMapName: simple-opa
    package: trino
  authentication:
    method:
      multiUser:
        userCredentialsSecret:
          name: simple-trino-users-secret
  catalogLabelSelector:
    matchLabels:
      trino: simple-trino
  vectorAggregatorConfigMapName: vector-aggregator-discovery

to:

spec:
  ...
  clusterConfig:
    authentication:
      method:
        multiUser:
          userCredentialsSecret:
            name: simple-trino-users-secret
    authorization:
      opa:
        configMapName: simple-opa
        package: trino
    catalogLabelSelector:
      matchLabels:
        trino: simple-trino
    vectorAggregatorConfigMapName: vector-aggregator-discovery

Upgrade from 23.1

Using stackablectl

You can list the available releases as follows

$ stackablectl release list

RELEASE            RELEASE DATE   DESCRIPTION
23.4               2023-04-25     Fifth release focusing on affinities and product status
23.1               2023-01-27     Fourth release focusing on image selection and logging
22.11              2022-11-08     Third release focusing on resource management
22.09              2022-09-09     Second release focusing on security and OpenShift support
22.06              2022-06-30     First official release of the Stackable Data Platform

To uninstall the 23.1 release run

$ stackablectl release uninstall 23.1
[INFO ] Uninstalling release 23.1
[INFO ] Uninstalling airflow operator
[INFO ] Uninstalling commons operator
# ...

Afterwards you will need to upgrade the CustomResourceDefinitions (CRDs) installed by the Stackable Platform. The reason for this is that helm will uninstall the operators but not the CRDs. This can be done using kubectl replace:

kubectl replace -f https://raw.githubusercontent.com/stackabletech/airflow-operator/23.4.1/deploy/helm/airflow-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/commons-operator/23.4.1/deploy/helm/commons-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/druid-operator/23.4.1/deploy/helm/druid-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/hbase-operator/23.4.1/deploy/helm/hbase-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/hdfs-operator/23.4.1/deploy/helm/hdfs-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/hive-operator/23.4.1/deploy/helm/hive-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/kafka-operator/23.4.1/deploy/helm/kafka-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/listener-operator/23.4.1/deploy/helm/listener-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/nifi-operator/23.4.1/deploy/helm/nifi-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/opa-operator/23.4.1/deploy/helm/opa-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/secret-operator/23.4.1/deploy/helm/secret-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/spark-k8s-operator/23.4.1/deploy/helm/spark-k8s-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/superset-operator/23.4.1/deploy/helm/superset-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/trino-operator/23.4.1/deploy/helm/trino-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/zookeeper-operator/23.4.1/deploy/helm/zookeeper-operator/crds/crds.yaml
customresourcedefinition.apiextensions.k8s.io "airflowclusters.airflow.stackable.tech" replaced
customresourcedefinition.apiextensions.k8s.io "airflowdbs.airflow.stackable.tech" replaced
customresourcedefinition.apiextensions.k8s.io "authenticationclasses.authentication.stackable.tech" replaced
customresourcedefinition.apiextensions.k8s.io "s3connections.s3.stackable.tech" replaced
...

To install the 23.4 release run

$ stackablectl release install 23.4
[INFO ] Installing release 23.4
[INFO ] Installing airflow operator in version 23.4.1
[INFO ] Installing commons operator in version 23.4.1
[INFO ] Installing druid operator in version 23.4.1
[INFO ] Installing hbase operator in version 23.4.1
[INFO ] Installing hdfs operator in version 23.4.1
[INFO ] Installing hive operator in version 23.4.1
[INFO ] Installing kafka operator in version 23.4.1
[INFO ] Installing listener operator in version 23.4.1
[INFO ] Installing nifi operator in version 23.4.1
[INFO ] Installing opa operator in version 23.4.1
[INFO ] Installing secret operator in version 23.4.1
[INFO ] Installing spark-k8s operator in version 23.4.1
[INFO ] Installing superset operator in version 23.4.1
[INFO ] Installing trino operator in version 23.4.1
[INFO ] Installing zookeeper operator in version 23.4.1

Using helm

Use helm list to list the currently installed operators.

You can use the following command to uninstall all operators that are part of the 23.1 release:

$ helm uninstall airflow-operator commons-operator druid-operator hbase-operator hdfs-operator hive-operator kafka-operator listener-operator nifi-operator opa-operator secret-operator spark-k8s-operator superset-operator trino-operator zookeeper-operator
release "airflow-operator" uninstalled
release "commons-operator" uninstalled
# ...

Afterwards you will need to upgrade the CustomResourceDefinitions (CRDs) installed by the Stackable Platform. The reason for this is that helm will uninstall the operators but not the CRDs. This can be done using kubectl replace:

kubectl replace -f https://raw.githubusercontent.com/stackabletech/airflow-operator/23.4.1/deploy/helm/airflow-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/commons-operator/23.4.1/deploy/helm/commons-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/druid-operator/23.4.1/deploy/helm/druid-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/hbase-operator/23.4.1/deploy/helm/hbase-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/hdfs-operator/23.4.1/deploy/helm/hdfs-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/hive-operator/23.4.1/deploy/helm/hive-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/kafka-operator/23.4.1/deploy/helm/kafka-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/listener-operator/23.4.1/deploy/helm/listener-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/nifi-operator/23.4.1/deploy/helm/nifi-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/opa-operator/23.4.1/deploy/helm/opa-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/secret-operator/23.4.1/deploy/helm/secret-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/spark-k8s-operator/23.4.1/deploy/helm/spark-k8s-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/superset-operator/23.4.1/deploy/helm/superset-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/trino-operator/23.4.1/deploy/helm/trino-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/zookeeper-operator/23.4.1/deploy/helm/zookeeper-operator/crds/crds.yaml

To install the 23.4 release run

helm repo add stackable-stable https://repo.stackable.tech/repository/helm-stable/
helm repo update stackable-stable
helm install --wait airflow-operator stackable-stable/airflow-operator --version 23.4.1
helm install --wait commons-operator stackable-stable/commons-operator --version 23.4.1
helm install --wait druid-operator stackable-stable/druid-operator --version 23.4.1
helm install --wait hbase-operator stackable-stable/hbase-operator --version 23.4.1
helm install --wait hdfs-operator stackable-stable/hdfs-operator --version 23.4.1
helm install --wait hive-operator stackable-stable/hive-operator --version 23.4.1
helm install --wait kafka-operator stackable-stable/kafka-operator --version 23.4.1
helm install --wait listener-operator stackable-stable/listener-operator --version 23.4.1
helm install --wait nifi-operator stackable-stable/nifi-operator --version 23.4.1
helm install --wait opa-operator stackable-stable/opa-operator --version 23.4.1
helm install --wait secret-operator stackable-stable/secret-operator --version 23.4.1
helm install --wait spark-k8s-operator stackable-stable/spark-k8s-operator --version 23.4.1
helm install --wait superset-operator stackable-stable/superset-operator --version 23.4.1
helm install --wait trino-operator stackable-stable/trino-operator --version 23.4.1
helm install --wait zookeeper-operator stackable-stable/zookeeper-operator --version 23.4.1

Release 23.1

This release marks a major change in the way operator and product images are versioned. Up until now, operators were versioned independently of each other and a platform release was a loosely coupled set of operator versions. This had major disadvantages both technical and organisational.

On the technical side, a multi-dimensional matrix of versions had to be tested, documentation cross-references had to be maintained and coordinating a platform release was extremely difficult.

Organizationally the biggest challenge was communication and coordination within the teams as well as to and with users. As a result, starting with this release, all operator and product images are versioned in lock-step. This platform release is marked 23.1 and all included components are tagged with 23.1.0. Eventual patch versions of the components that might follow and will be tagged with 23.1.1, 23.1.2 and so on.

The focus in this platform release is on the support of offline (or on-premise) product images and the partial rollout of logging support.

New platform features

The following new major platform features were added:

Product image selection

Product image selection has been expanded to cover different scenarios:

  • Stackable-provided product images, defined with the repository, the product version and the stackable tag

  • As above, but without the stackable tag (whereby the most recent tagged image will be taken)

  • The product version and a full repository path (this allows fully-customized product images)

These options are described in more detail in this ADR and on this concepts page.

N.B. this is a breaking change across all operators as spec.version has been replaced by spec.image.

Logging Aggregation

Component activity within the platform is logged in a way that makes it difficult to find, persist and consolidate this information. Log configuration is also a challenge. To address these two issues a logging framework has been added to the platform, offering a consistent custom resource configuration and a separate, persisted sink (the current implementation support OpenSearch). This is discussed in more detail in this ADR and on this concepts page.

In this release this has been added to the following components:

Support for other products will be added in future releases.

New Versions

The following new product version is now supported:

Deprecated Versions

The following product version is no longer supported:

Product features

Additionally, there are some individual product features that are noteworthy

stackablectl

The following have been added to stackablectl:

Logging demo

This illustrates how to set up logging for Zookeeper and browse the results in an Open Search dashboard. This has been implemented for HBase, Hadoop and Zookeeper and will eventually be available for all Stackable operators.

LDAP stack and tutorial

LDAP support has now been added to multiple products. An explanation of the overall approach is given here but in order to make the configuration steps a little clearer a tutorial has been added that uses a dedicated Stackable stack for OpenLDAP and shows its usage.

The quickstart guide shows how to get started with stackablectl. This link lists the available demos.

Supported Kubernetes versions

This release supports the following Kubernetes versions:

  • 1.25

  • 1.24

  • 1.23

  • 1.22

Breaking changes

This release brings with it several breaking changes needed to future-proof the platform. You will need to adapt your existing CRDs due to the following breaking changes:

All Stackable Operators

As mentioned above, product image selection is a breaking for all operators. Previously the product image was declared using spec.version:

spec:
  version: 396-stackable23.1

(example for Trino)

This must now be replaced with spec.image:

spec:
  image:
    productVersion: 396
    stackableVersion: 23.1

This is the same pattern across operators. so for Hive the change would look like this. From:

spec:
  version: 3.1.3-stackable23.1

to

spec:
  image:
    productVersion: 3.1.3
    stackableVersion: 23.1

Stackable Operator for Apache Druid

This means a stackable version >= 23.1 has to be used for the product image.

Deep storage, Ingestion spec, discovery config maps, authentication etc. are now subfields of spec.clusterConfig instead of being top level under spec. Change the resource from e.g.:

  zookeeperConfigMapName: simple-druid-znode
  metadataStorageDatabase:
    dbType: derby
    connString: jdbc:derby://localhost:1527/var/druid/metadata.db;create=true
    host: localhost
    port: 1527
  deepStorage:
    hdfs:
      configMapName: simple-hdfs
      directory: /data

to

  clusterConfig:
    deepStorage:
      hdfs:
        configMapName: simple-hdfs
        directory: /data
    metadataStorageDatabase:
      dbType: derby
      connString: jdbc:derby://localhost:1527/var/druid/metadata.db;create=true
      host: localhost
      port: 1527
    tls: null
    zookeeperConfigMapName: simple-druid-znode

Stackable Operator for Apache Hadoop

As part of the change mentioned above we also did some code cleanup that allowed us to remove arbitrary hard-coded values from the operator. This change affects the directory structure the operator creates inside of the PersistentVolumes used for permanent storage.

The old folder naming was:

  • DataNode → data

  • JournalNode → journal

  • NameNode → name

which has now been adopted to match the actual rolename:

  • DataNode → datanode

  • JournalNode → journalnode

  • NameNode → namenode

Unfortunately, this means that for cluster that where initially rolled out with an older operator version, a one-time migration step becomes necessary to rename these directories.

You can either do this manually by attaching the PVs to a pod and performing the rename (cluster needs to be stopped for this) or use the script provided below.

Please be aware that if this script runs after the cluster was already restarted with the newer operator version it will delete any data that was written to the empty post-upgrade HDFS that was stood up by the new operator.
#!/usr/bin/env bash

if [ $# -ne 1 ] ; then
     echo "Usage: $0 CLUSTER_NAME"
     exit 1
else
    HDFS_CLUSTER_NAME=$1
fi

kubectl get pvc -l app.kubernetes.io/name=hdfs -l app.kubernetes.io/instance="$HDFS_CLUSTER_NAME"
for pvc in $(kubectl get pvc -l app.kubernetes.io/name=hdfs -l app.kubernetes.io/instance="$HDFS_CLUSTER_NAME" -l app.kubernetes.io/component=journalnode -o name | sed -e 's#persistentvolumeclaim/##'); do
kubectl apply -f - << EOF
apiVersion: batch/v1
kind: Job
metadata:
  name: migrate-journalnode-${pvc}
spec:
  template:
    spec:
      containers:
        - name: migrate
          image: docker.stackable.tech/stackable/hadoop:3.3.4-stackable23.1.0
          command: ["bash", "-c", "ls -la /stackable/data && if [ -d /stackable/data/journal ]; then echo Removing might existing target dir && rm -rf /stackable/data/journalnode && echo Renaming folder && mv /stackable/data/journal /stackable/data/journalnode; else echo Nothing to do; fi"]
          volumeMounts:
            - name: data
              mountPath: /stackable/data
      volumes:
        - name: data
          persistentVolumeClaim:
            claimName: ${pvc}
      restartPolicy: Never
  backoffLimit: 1
EOF
done
for pvc in $(kubectl get pvc -l app.kubernetes.io/name=hdfs -l app.kubernetes.io/instance="$HDFS_CLUSTER_NAME" -l app.kubernetes.io/component=namenode -o name | sed -e 's#persistentvolumeclaim/##'); do
kubectl apply -f - << EOF
apiVersion: batch/v1
kind: Job
metadata:
  name: migrate-namenode-${pvc}
spec:
  template:
    spec:
      containers:
        - name: migrate
          image: docker.stackable.tech/stackable/hadoop:3.3.4-stackable23.1.0
          command: ["bash", "-c", "ls -la /stackable/data && if [ -d /stackable/data/name ]; then echo Removing might existing target dir && rm -rf /stackable/data/namenode && echo Renaming folder && mv /stackable/data/name /stackable/data/namenode; else echo Nothing to do; fi"]
          volumeMounts:
            - name: data
              mountPath: /stackable/data
      volumes:
        - name: data
          persistentVolumeClaim:
            claimName: ${pvc}
      restartPolicy: Never
  backoffLimit: 1
EOF
done
for pvc in $(kubectl get pvc -l app.kubernetes.io/name=hdfs -l app.kubernetes.io/instance="$HDFS_CLUSTER_NAME" -l app.kubernetes.io/component=datanode -o name | sed -e 's#persistentvolumeclaim/##'); do
kubectl apply -f - << EOF
apiVersion: batch/v1
kind: Job
metadata:
  name: migrate-datanode-${pvc}
spec:
  template:
    spec:
      containers:
        - name: migrate
          image: docker.stackable.tech/stackable/hadoop:3.3.4-stackable23.1.0
          command: ["bash", "-c", "ls -la /stackable/data/data && if [ -d /stackable/data/data/data ]; then echo Removing might existing target dir && rm -rf /stackable/data/data/datanode && echo Renaming folder && mv /stackable/data/data/data /stackable/data/data/datanode; else echo Nothing to do; fi"]
          volumeMounts:
            - name: data
              mountPath: /stackable/data/data
      volumes:
        - name: data
          persistentVolumeClaim:
            claimName: ${pvc}
      restartPolicy: Never
  backoffLimit: 1
EOF
done

The migration process for this now becomes:

  • Stop HDFS cluster by either removing the HdfsCluster definition object or scaling all roles to 0 replicas

  • Uninstall Stackable Operator for Apache Hadoop

  • Run migration script

  • Install newer version of Stackable Operator for Apache Hadoop

Stackable Operator for Apache Hive

These two changes mean that resources previously defined like this:

  s3:
    reference: minio
  metastore:
    roleGroups:
      default:
        replicas: 1
        config:
          database:
            connString: jdbc:postgresql://hive-postgresql:5432/hive
            user: hive
            password: hive
            dbType: postgres

will now be defined like this:

  clusterConfig:
    database:
      connString: jdbc:postgresql://hive-postgresql:5432/hive
      user: hive
      password: hive
      dbType: postgres
    s3:
      reference: minio
  metastore:
    roleGroups:
      default:
        replicas: 1

Stackable Operator for Apache Kafka

This means a stackable version >= 23.1 has to be used for the product image.

spec:
  ...
  zookeeperConfigMapName: simple-kafka-znode
  config:
    authentication:
      - authenticationClass: kafka-client-auth-tls
    tls:
      secretClass: tls
    clientAuthentication:
      authenticationClass: kafka-client-auth-tls
    internalTls:
      secretClass: kafka-internal-tls

Changes to:

spec:
  ...
  clusterConfig:
    authentication:
      - authenticationClass: kafka-client-auth-tls
    tls:
      internalSecretClass: kafka-internal-tls
      serverSecretClass: tls
    zookeeperConfigMapName: simple-kafka-znode

Stackable Operator for Apache Nifi

This means a stackable version >= 23.1 has to be used for the product image.

Stackable Operator for Trino

This means a stackable version >= 23.1 has to be used for the product image.

This changes the secret definition from:

stringData:
  LDAP_USER: cn=admin,dc=example,dc=org
  LDAP_PASSWORD: admin

to:

stringData:
  user: cn=admin,dc=example,dc=org
  password: admin

Stackable Operator for Apache Zookeeper

Similar to the Kafka example above, the configuration settings are consolidated under .spec i.e. from:

  config:
    tls:
      secretClass: tls
    clientAuthentication:
      authenticationClass: zk-client-tls
    quorumTlsSecretClass: tls

to:

  clusterConfig:
    authentication:
      - authenticationClass: zk-client-tls
    tls:
      serverSecretClass: tls
      quorumSecretClass: tls

Upgrade from 22.11

Using stackablectl

You can list the available releases as follows

$ stackablectl release list
RELEASE            RELEASE DATE   DESCRIPTION
23.1               2023-01-27     Fourth release focusing on image selection and logging
22.11              2022-11-08     Third release focusing on resource management
22.09              2022-09-09     Second release focusing on security and OpenShift support
22.06              2022-06-30     First official release of the Stackable Data Platform

To uninstall the 22.11 release run

$ stackablectl release uninstall 22.11
[INFO ] Uninstalling release 22.11
[INFO ] Uninstalling airflow operator
[INFO ] Uninstalling commons operator
# ...

Afterwards you will need to update the CustomResourceDefinitions (CRDs) installed by the Stackable Platform. The reason for this is that helm will uninstall the operators but not the CRDs.

$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/airflow-operator/23.1.0/deploy/helm/airflow-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/commons-operator/23.1.0/deploy/helm/commons-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/druid-operator/23.1.0/deploy/helm/druid-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/hbase-operator/23.1.0/deploy/helm/hbase-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/hdfs-operator/23.1.0/deploy/helm/hdfs-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/hive-operator/23.1.0/deploy/helm/hive-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/kafka-operator/23.1.0/deploy/helm/kafka-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/nifi-operator/23.1.0/deploy/helm/nifi-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/opa-operator/23.1.0/deploy/helm/opa-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/secret-operator/23.1.0/deploy/helm/secret-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/spark-k8s-operator/23.1.0/deploy/helm/spark-k8s-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/superset-operator/23.1.0/deploy/helm/superset-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/trino-operator/23.1.0/deploy/helm/trino-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/zookeeper-operator/23.1.0/deploy/helm/zookeeper-operator/crds/crds.yaml

To install the 23.1 release run

$ stackablectl release install 23.1
[INFO ] Installing release 23.1
[INFO ] Installing airflow operator in version 23.1.0
[INFO ] Installing commons operator in version 23.1.0
[INFO ] Installing druid operator in version 23.1.0
[INFO ] Installing hbase operator in version 23.1.0
[INFO ] Installing hdfs operator in version 23.1.0
[INFO ] Installing hive operator in version 23.1.0
[INFO ] Installing kafka operator in version 23.1.0
[INFO ] Installing listener operator in version 23.1.0
[INFO ] Installing nifi operator in version 23.1.0
[INFO ] Installing opa operator in version 23.1.0
[INFO ] Installing secret operator in version 23.1.0
[INFO ] Installing spark-k8s operator in version 23.1.0
[INFO ] Installing superset operator in version 23.1.0
[INFO ] Installing trino operator in version 23.1.0
[INFO ] Installing zookeeper operator in version 23.1.0
# ...

Using helm

Use helm list to list the currently installed operators.

You can use the following command to uninstall all operators that are part of the release 22.11:

$ helm uninstall airflow-operator commons-operator druid-operator hbase-operator hdfs-operator hive-operator kafka-operator nifi-operator opa-operator secret-operator spark-k8s-operator superset-operator trino-operator zookeeper-operator
release "airflow-operator" uninstalled
release "commons-operator" uninstalled
# ...

Afterwards you will need to update the CustomResourceDefinitions (CRDs) installed by the Stackable Platform. This is because helm will uninstall the operators but not the CRDs.

$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/airflow-operator/23.1.0/deploy/helm/airflow-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/commons-operator/23.1.0/deploy/helm/commons-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/druid-operator/23.1.0/deploy/helm/druid-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/hbase-operator/23.1.0/deploy/helm/hbase-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/hdfs-operator/23.1.0/deploy/helm/hdfs-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/hive-operator/23.1.0/deploy/helm/hive-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/kafka-operator/23.1.0/deploy/helm/kafka-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/nifi-operator/23.1.0/deploy/helm/nifi-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/opa-operator/23.1.0/deploy/helm/opa-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/secret-operator/23.1.0/deploy/helm/secret-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/spark-k8s-operator/23.1.0/deploy/helm/spark-k8s-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/superset-operator/23.1.0/deploy/helm/superset-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/trino-operator/23.1.0/deploy/helm/trino-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/zookeeper-operator/23.1.0/deploy/helm/zookeeper-operator/crds/crds.yaml

To install the release 23.1 run

$ helm repo add stackable-stable https://repo.stackable.tech/repository/helm-stable/
$ helm repo update stackable-stable
$ helm install --wait airflow-operator stackable-stable/airflow-operator --version 23.1.0
$ helm install --wait commons-operator stackable-stable/commons-operator --version 23.1.0
$ helm install --wait druid-operator stackable-stable/druid-operator --version 23.1.0
$ helm install --wait hbase-operator stackable-stable/hbase-operator --version 23.1.0
$ helm install --wait hdfs-operator stackable-stable/hdfs-operator --version 23.1.0
$ helm install --wait hive-operator stackable-stable/hive-operator --version 23.1.0
$ helm install --wait kafka-operator stackable-stable/kafka-operator --version 23.1.0
$ helm install --wait listener-operator stackable-stable/listener-operator --version 23.1.0
$ helm install --wait nifi-operator stackable-stable/nifi-operator --version 23.1.0
$ helm install --wait opa-operator stackable-stable/opa-operator --version 23.1.0
$ helm install --wait secret-operator stackable-stable/secret-operator --version 23.1.0
$ helm install --wait spark-k8s-operator stackable-stable/spark-k8s-operator --version 23.1.0
$ helm install --wait superset-operator stackable-stable/superset-operator --version 23.1.0
$ helm install --wait trino-operator stackable-stable/trino-operator --version 23.1.0
$ helm install --wait zookeeper-operator stackable-stable/zookeeper-operator --version 23.1.0

Release 22.11

This is the third release of the Stackable Data Platform, which this time focuses on resource management.

New platform features

The following new major platform features were added:

CPU and memory limits configurable

The operators now request resources from Kubernetes for the products and required CPU and memory can now also be configured for all products. If your product instances are less performant after the update, the new defaults might be set too low and we recommend to set custom requests for your cluster.

Orphaned Resources

The operators now properly clean up after scaling down products. This means for example deleting StatefulSets that were left over after scaling down.

New Versions

New product versions are supported.

Product features

Additionally there are some individual product features that are noteworthy

Supported Kubernetes versions

This release supports the following Kubernetes versions:

  • 1.25 (new)

  • 1.24

  • 1.23

  • 1.22

Upgrade from 22.09

Using stackablectl

You can list the available releases as follows

$ stackablectl release list
RELEASE            RELEASE DATE   DESCRIPTION
22.11              2022-11-08     Third release focusing on resource management
22.09              2022-09-09     Second release focusing on security and OpenShift support
22.06              2022-06-30     First official release of the Stackable Data Platform

To uninstall the 22.09 release run

$ stackablectl release uninstall 22.09
[INFO ] Uninstalling release 22.09
[INFO ] Uninstalling airflow operator
[INFO ] Uninstalling commons operator
# ...

Afterwards you will need to update the CustomResourceDefinitions (CRDs) installed by the Stackable Platform. The reason for this is that helm will uninstall the operators but not the CRDs.

$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/airflow-operator/0.6.0/deploy/helm/airflow-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/commons-operator/0.4.0/deploy/helm/commons-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/druid-operator/0.8.0/deploy/helm/druid-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/hbase-operator/0.5.0/deploy/helm/hbase-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/hdfs-operator/0.6.0/deploy/helm/hdfs-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/hive-operator/0.8.0/deploy/helm/hive-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/kafka-operator/0.8.0/deploy/helm/kafka-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/nifi-operator/0.8.0/deploy/helm/nifi-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/opa-operator/0.11.0/deploy/helm/opa-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/secret-operator/0.6.0/deploy/helm/secret-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/spark-k8s-operator/0.6.0/deploy/helm/spark-k8s-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/superset-operator/0.7.0/deploy/helm/superset-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/trino-operator/0.8.0/deploy/helm/trino-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/zookeeper-operator/0.12.0/deploy/helm/zookeeper-operator/crds/crds.yaml

To install the 22.11 release run

$ stackablectl release install 22.11
[INFO ] Installing release 22.11
[INFO ] Installing airflow operator in version 0.6.0
[INFO ] Installing commons operator in version 0.4.0
[INFO ] Installing druid operator in version 0.8.0
[INFO ] Installing hbase operator in version 0.5.0
[INFO ] Installing hdfs operator in version 0.6.0
[INFO ] Installing hive operator in version 0.8.0
[INFO ] Installing kafka operator in version 0.8.0
[INFO ] Installing nifi operator in version 0.8.0
[INFO ] Installing opa operator in version 0.11.0
[INFO ] Installing secret operator in version 0.6.0
[INFO ] Installing spark-k8s operator in version 0.6.0
[INFO ] Installing superset operator in version 0.7.0
[INFO ] Installing trino operator in version 0.7.0
[INFO ] Installing zookeeper operator in version 0.12.0
# ...

Using helm

Use helm list to list the currently installed operators.

You can use the following command to uninstall all of the operators that are part of the release 22.09:

$ helm uninstall airflow-operator commons-operator druid-operator hbase-operator hdfs-operator hive-operator kafka-operator nifi-operator opa-operator secret-operator spark-k8s-operator superset-operator trino-operator zookeeper-operator
release "airflow-operator" uninstalled
release "commons-operator" uninstalled
# ...

Afterwards you will need to update the CustomResourceDefinitions (CRDs) installed by the Stackable Platform. This is because helm will uninstall the operators but not the CRDs.

$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/airflow-operator/0.6.0/deploy/helm/airflow-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/commons-operator/0.4.0/deploy/helm/commons-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/druid-operator/0.8.0/deploy/helm/druid-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/hbase-operator/0.5.0/deploy/helm/hbase-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/hdfs-operator/0.6.0/deploy/helm/hdfs-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/hive-operator/0.8.0/deploy/helm/hive-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/kafka-operator/0.8.0/deploy/helm/kafka-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/nifi-operator/0.8.0/deploy/helm/nifi-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/opa-operator/0.11.0/deploy/helm/opa-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/secret-operator/0.6.0/deploy/helm/secret-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/spark-k8s-operator/0.6.0/deploy/helm/spark-k8s-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/superset-operator/0.7.0/deploy/helm/superset-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/trino-operator/0.8.0/deploy/helm/trino-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/zookeeper-operator/0.12.0/deploy/helm/zookeeper-operator/crds/crds.yaml

To install the release 22.11 run

$ helm repo add stackable-stable https://repo.stackable.tech/repository/helm-stable/
$ helm repo update stackable-stable
$ helm install --wait airflow-operator stackable-stable/airflow-operator --version 0.6.0
$ helm install --wait commons-operator stackable-stable/commons-operator --version 0.4.0
$ helm install --wait druid-operator stackable-stable/druid-operator --version 0.8.0
$ helm install --wait hbase-operator stackable-stable/hbase-operator --version 0.5.0
$ helm install --wait hdfs-operator stackable-stable/hdfs-operator --version 0.6.0
$ helm install --wait hive-operator stackable-stable/hive-operator --version 0.8.0
$ helm install --wait kafka-operator stackable-stable/kafka-operator --version 0.8.0
$ helm install --wait nifi-operator stackable-stable/nifi-operator --version 0.8.0
$ helm install --wait opa-operator stackable-stable/opa-operator --version 0.11.0
$ helm install --wait secret-operator stackable-stable/secret-operator --version 0.6.0
$ helm install --wait spark-k8s-operator stackable-stable/spark-k8s-operator --version 0.6.0
$ helm install --wait superset-operator stackable-stable/superset-operator --version 0.7.0
$ helm install --wait trino-operator stackable-stable/trino-operator --version 0.7.0
$ helm install --wait zookeeper-operator stackable-stable/zookeeper-operator --version 0.12.0

Breaking changes

You will need to adapt your existing CRDs due to the following breaking changes:

Stackable Operator for Apache Spark

The configuration of pod resource requests has been changed to be consistent with other operators that are part of the Stackable Data Platform (#174).

In the previous version, these were configured like this:

  driver:
    cores: 1
    coreLimit: "1200m"
    memory: "512m"`

From now on, Pod resources can be configured in two different ways. The first and recommended way is to add a resources section for each role as the following examples shows:

  driver:
    resources:
      cpu:
        min: "1"
        max: "1500m"
      memory:
        limit: "1Gi"

The second method is to use the sparkConf section and and set them individually as spark properties:

  sparkConf:
    spark.kubernetes.submission.waitAppCompletion: "false"
    spark.kubernetes.driver.pod.name: "resources-sparkconf-driver"
    spark.kubernetes.executor.podNamePrefix: "resources-sparkconf"
    spark.kubernetes.driver.request.cores: "2"
    spark.kubernetes.driver.limit.cores: "3"

When both methods are used, the settings in the sparkConf section override the resources configuration.

Note that none of the settings above have any influence over the parallelism used by Spark itself. The only supported way to affect this is as follows:

  sparkConf:
    spark.driver.cores: "3"
    spark.executor.cores: "3"

Release 22.09

This is the second release of the Stackable Data Platform. It contains lots of new features and bugfixes. The main features focus on OpenShift support and security.

New platform features

The following new major platform features were added:

OpenShift compatibility

We have made continued progress towards OpenShift compability, and the following operators can now be previewed on OpenShift. Further improvements are expected in future releases, but no stability or compatibility guarantees are currently made for OpenShift clusters.

Support for internal and external TLS

The following operators support operating the products at a maximal level of transport security by using TLS certificates to secure internal and external communication:

LDAP authentication

Use a central LDAP server to manage all of your user identities in a single place. The following operators added support for LDAP authentication:

stackablectl

stackablectl now supports deploying ready-to-use demos, which give an end-to-end demonstration of the usage of the Stackable Data Platform. The quickstart guide shows how to get started with stackablectl. Here you can see the available demos.

Supported Kubernetes versions

This release supports the following Kubernetes versions:

  • 1.24

  • 1.23

  • 1.22

Support for 1.21 was dropped.

Upgrade from 22.06

Using stackablectl

You can list the available releases as follows

$ stackablectl release list
RELEASE            RELEASE DATE   DESCRIPTION
22.11              2022-11-08     Third release candidate of 22.11
22.09              2022-09-09     Second release focusing on security and OpenShift support
22.06              2022-06-30     First official release of the Stackable Data Platform

To uninstall the 22.06 release run

$ stackablectl release uninstall 22.06
[INFO ] Uninstalling release 22.06
[INFO ] Uninstalling airflow operator
[INFO ] Uninstalling commons operator
# ...

Afterwards you will need to update the CustomResourceDefinitions (CRDs) installed by the Stackable Platform. The reason is, that helm will uninstall the operators but not the CRDs.

$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/airflow-operator/0.5.0/deploy/helm/airflow-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/commons-operator/0.3.0/deploy/helm/commons-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/druid-operator/0.7.0/deploy/helm/druid-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/hbase-operator/0.4.0/deploy/helm/hbase-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/hdfs-operator/0.5.0/deploy/helm/hdfs-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/hive-operator/0.7.0/deploy/helm/hive-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/kafka-operator/0.7.0/deploy/helm/kafka-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/nifi-operator/0.7.0/deploy/helm/nifi-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/opa-operator/0.10.0/deploy/helm/opa-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/secret-operator/0.5.0/deploy/helm/secret-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/spark-k8s-operator/0.5.0/deploy/helm/spark-k8s-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/superset-operator/0.6.0/deploy/helm/superset-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/trino-operator/0.6.0/deploy/helm/trino-operator/crds/crds.yaml
$ kubectl apply -f https://raw.githubusercontent.com/stackabletech/zookeeper-operator/0.11.0/deploy/helm/zookeeper-operator/crds/crds.yaml

To install the 22.09 release run

$ stackablectl release install 22.09
[INFO ] Installing release 22.09
[INFO ] Installing airflow operator in version 0.5.0
[INFO ] Installing commons operator in version 0.3.0
[INFO ] Installing druid operator in version 0.7.0
[INFO ] Installing hbase operator in version 0.4.0
[INFO ] Installing hdfs operator in version 0.5.0
[INFO ] Installing hive operator in version 0.7.0
[INFO ] Installing kafka operator in version 0.7.0
[INFO ] Installing nifi operator in version 0.7.0
[INFO ] Installing opa operator in version 0.10.0
[INFO ] Installing secret operator in version 0.5.0
[INFO ] Installing spark-k8s operator in version 0.5.0
[INFO ] Installing superset operator in version 0.6.0
[INFO ] Installing trino operator in version 0.6.0
[INFO ] Installing zookeeper operator in version 0.11.0
# ...

Using helm

Use helm list to list the currently installed operators.

You can use the following command to uninstall all of the operators that are part of the release 22.06:

$ helm uninstall airflow-operator commons-operator druid-operator hbase-operator hdfs-operator hive-operator kafka-operator nifi-operator opa-operator secret-operator spark-k8s-operator superset-operator trino-operator zookeeper-operator
release "airflow-operator" uninstalled
release "commons-operator" uninstalled
# ...

Afterwards you will need to update the CustomResourceDefinitions (CRDs) installed by the Stackable Platform. The reason is, that helm will uninstall the operators but not the CRDs.

$ kubectl apply \
  -f https://raw.githubusercontent.com/stackabletech/airflow-operator/0.5.0/deploy/helm/airflow-operator/crds/crds.yaml \
  -f https://raw.githubusercontent.com/stackabletech/commons-operator/0.3.0/deploy/helm/commons-operator/crds/crds.yaml \
  -f https://raw.githubusercontent.com/stackabletech/druid-operator/0.7.0/deploy/helm/druid-operator/crds/crds.yaml \
  -f https://raw.githubusercontent.com/stackabletech/hbase-operator/0.4.0/deploy/helm/hbase-operator/crds/crds.yaml \
  -f https://raw.githubusercontent.com/stackabletech/hdfs-operator/0.5.0/deploy/helm/hdfs-operator/crds/crds.yaml \
  -f https://raw.githubusercontent.com/stackabletech/hive-operator/0.7.0/deploy/helm/hive-operator/crds/crds.yaml \
  -f https://raw.githubusercontent.com/stackabletech/kafka-operator/0.7.0/deploy/helm/kafka-operator/crds/crds.yaml \
  -f https://raw.githubusercontent.com/stackabletech/nifi-operator/0.7.0/deploy/helm/nifi-operator/crds/crds.yaml \
  -f https://raw.githubusercontent.com/stackabletech/opa-operator/0.10.0/deploy/helm/opa-operator/crds/crds.yaml \
  -f https://raw.githubusercontent.com/stackabletech/secret-operator/0.5.0/deploy/helm/secret-operator/crds/crds.yaml \
  -f https://raw.githubusercontent.com/stackabletech/spark-k8s-operator/0.5.0/deploy/helm/spark-k8s-operator/crds/crds.yaml \
  -f https://raw.githubusercontent.com/stackabletech/superset-operator/0.6.0/deploy/helm/superset-operator/crds/crds.yaml \
  -f https://raw.githubusercontent.com/stackabletech/trino-operator/0.6.0/deploy/helm/trino-operator/crds/crds.yaml \
  -f https://raw.githubusercontent.com/stackabletech/zookeeper-operator/0.11.0/deploy/helm/zookeeper-operator/crds/crds.yaml

To install the release 22.09 run

$ helm repo add stackable-stable https://repo.stackable.tech/repository/helm-stable/
$ helm repo update stackable-stable
$ helm install --wait airflow-operator stackable-stable/airflow-operator --version 0.5.0
$ helm install --wait commons-operator stackable-stable/commons-operator --version 0.3.0
$ helm install --wait druid-operator stackable-stable/druid-operator --version 0.7.0
$ helm install --wait hbase-operator stackable-stable/hbase-operator --version 0.4.0
$ helm install --wait hdfs-operator stackable-stable/hdfs-operator --version 0.5.0
$ helm install --wait hive-operator stackable-stable/hive-operator --version 0.7.0
$ helm install --wait kafka-operator stackable-stable/kafka-operator --version 0.7.0
$ helm install --wait nifi-operator stackable-stable/nifi-operator --version 0.7.0
$ helm install --wait opa-operator stackable-stable/opa-operator --version 0.10.0
$ helm install --wait secret-operator stackable-stable/secret-operator --version 0.5.0
$ helm install --wait spark-k8s-operator stackable-stable/spark-k8s-operator --version 0.5.0
$ helm install --wait superset-operator stackable-stable/superset-operator --version 0.6.0
$ helm install --wait trino-operator stackable-stable/trino-operator --version 0.6.0
$ helm install --wait zookeeper-operator stackable-stable/zookeeper-operator --version 0.11.0

Breaking changes

You will need to adapt your existing CRDs to the following breaking changes:

druid-operator

  1. HDFS deep storage is now configurable via the HDFS discovery config map instead of a url to a HDFS name node (#262). Instead of

  deepStorage:
    hdfs:
      storageDirectory: hdfs://druid-hdfs-namenode-default-0:8020/data

use

  deepStorage:
    hdfs:
      configMapName: druid-hdfs
      directory: /druid

kafka-operator

  1. Add TLS encryption and authentication support for internal and client communications. This is breaking for clients because the cluster is secured per default, which results in a client port change (#442). If you don’t want to use TLS to secure your Kafka cluster you can restore the old behavior by using the tls attribute as follows:

apiVersion: kafka.stackable.tech/v1alpha1
kind: KafkaCluster
# ...
spec:
  config:
    tls: null
  # ...

trino-operator

  1. TrinoCatalogs now have their own CRD object and get referenced by the TrinoCluster (#263). Instead of

apiVersion: trino.stackable.tech/v1alpha1
kind: TrinoCluster
# ...
spec:
  hiveConfigMapName: hive
  s3:
    inline:
      host: minio
      port: 9000
      accessStyle: Path
      credentials:
        secretClass: s3-credentials
  # ...

use

apiVersion: trino.stackable.tech/v1alpha1
kind: TrinoCluster
# ...
spec:
  catalogLabelSelector:
    trino: trino
  # ...
---
apiVersion: trino.stackable.tech/v1alpha1
kind: TrinoCatalog
metadata:
  name: hive
  labels:
    trino: trino
spec:
  connector:
    hive:
      metastore:
        configMap: hive
      s3:
        inline:
          host: minio
          port: 9000
          accessStyle: Path
          credentials:
              secretClass: s3-credentials

Release 22.06

This is our first release of the Stackable Data Platform, bringing Kubernetes operators for 12 products as well as stackablectl, the commandline tool to easily install data products in Kubernetes. Operators spin up production ready product applications. Also, there are some common features across all operators, such as monitoring, service discovery and configuration overrides. Find the Platform features, stackablectl features and Operators below.

Please report any issues you find in the specific operator repositories or in our dedicated github.com/stackabletech/issues/[issues] repository. You may also join us in our Slack community or contact us via our homepage.

While we are very proud of this release it is our first one and we’ll add new features and fix bugs all the time and will have regular releases from now on.

Platform features

Easily install production ready data applications

Using a familiar declarative approach, users can easily install data applications such as Apache Kafka or Trino across multiple cloud Kubernetes providers or on their own data centers. The installation process is fully automated while also providing the flexibility for the user to tune relevant aspects of each application.

Monitoring

All products have monitoring with prometheus enabled. Learn more

Service discovery

Products on the Stackable platform use service discovery to easily interconnect with each other. Learn more

Configuration overrides

All operators support configuration overrides, these are documented in the specific operator documentation pages.

Common S3 configuration

Many products support connecting to S3 to load and/or store data. There is a common resource for S3 connections and buckets across all operators that can be reused. Learn more

Roles and role groups

To support hybrid hardware clusters, the Stackable platform uses the concept of role groups. Services and applications can be configured to maximize hardware efficiency.

Standardized

Learn once reuse everywhere. We use the same conventions in all our operators. Configure your LDAP or S3 connections once and reuse them everywhere. All our operators reuse the same CRD structure as well.

stackablectl

stackablectl is used to install and interact with the operators, either individually or with multiple at once.

Operators

This is the list of all operators in this current release, with their versions for this release.

Products

Read up on the supported versions for each of these products.

Supporting operators

Supported Kubernetes versions

This release supports the following Kubernetes versions:

  • 1.23

  • 1.22

  • 1.21