Cluster operations

Stackable operators offer different cluster operations to control the reconciliation process. This is useful when updating operators, debugging or testing of new settings:

  • reconciliationPaused - Stop the operator from reconciling the cluster spec. The status will still be updated.

  • stopped - Stop all running Pods but keep updating all deployed resources like ConfigMaps, Services and the cluster status.

If not specified, clusterOperation.reconciliationPaused and clusterOperation.stopped default to false.

When clusterOperation.reconciliationPaused is set to true, operators will ignore reconciliation events (creations, updates, deletions).

Furthermore, if you create a stacklet where clusterOperation.reconciliationPaused is set to true, no resources will be created.

When setting clusterOperation.reconciliationPaused and clusterOperation.stopped to true in the same step, clusterOperation.reconciliationPaused will take precedence.

This means the cluster will stop reconciling immediately and the stopped field is ignored.

To avoid this, the cluster should first be stopped and then paused.

Example

---
apiVersion: mycluster.stackable.tech/v1alpha1
kind: MyCluster
metadata:
  name: my-cluster
spec:
  clusterOperation:
    reconciliationPaused: false (1)
    stopped: false (2)
1 The clusterOperation.reconciliationPaused flag set to true stops the operator from reconciling any changes to the cluster spec. The cluster status is still updated.
2 The clusterOperation.stopped flag set to true stops all pods in the cluster. This is done by setting all deployed StatefulSet replicas to 0.

Example usage (updating operator without downtime)

One example usage of the reconciliationPaused feature is to update your operator without all deployed stacklets restarting simultaneously due to the changes the new operator version will apply.

  1. Disable reconciliation for e.g. ZookeeperCluster

    Execute the following command for every stacklet that should not be restarted by the operator update:

    $ kubectl patch zookeepercluster/simple-zk --patch '{"spec": {"clusterOperation": {"reconciliationPaused": true}}}' --type=merge
  2. Update operator

    $ stackablectl operator uninstall zookeeper
    $ # Replace CRD with new version, e.g. kubectl replace -f https://raw.githubusercontent.com/stackabletech/zookeeper-operator/24.7.0/deploy/helm/zookeeper-operator/crds/crds.yaml
    $ stackablectl operator install zookeeper=24.7.0 # choose your version
  3. No Zookeeper Pods have been restarted, they are still using the old image.

  4. Enable reconciliation again

    You can do this step by step for every stacklet you have, so that they will not restart simultaneously

    $ kubectl patch zookeepercluster/simple-zk --patch '{"spec": {"clusterOperation": {"reconciliationPaused": false}}}' --type=merge
  5. Zookeeper Pods will restart and pull in the new image

Service restarts

Manual restarts

Sometimes it is necessary to restart services deployed in Kubernetes. A service restart should induce as little disruption as possible, ideally none.

Most operators create StatefulSet objects for the products they manage and Kubernetes offers a rollout mechanism to restart them. You can use kubectl rollout restart statefulset to restart a StatefulSet previously created by an operator.

To illustrate how to use the command line to restart one or more Pods, we will assume you used the Stackable HDFS Operator to deploy an HDFS Stacklet called dumbo.

This Stacklet will consist, among other things, of three StatefulSets created for each HDFS role: namenode, datanode and journalnode. Let’s list them:

$ kubectl get statefulset -l app.kubernetes.io/instance=dumbo
NAME                        READY   AGE
dumbo-datanode-default      2/2     4m41s
dumbo-journalnode-default   1/1     4m41s
dumbo-namenode-default      2/2     4m41s

To restart the HDFS DataNode Pods, run:

$ kubectl rollout restart statefulset dumbo-datanode-default
statefulset.apps/dumbo-datanode-default restarted

Sometimes you want to restart all Pods of a stacklet and not just individual roles. This can be achieved in a similar manner by using labels instead of StatefulSet names. Continuing with the example above, to restart all HDFS Pods you would have to run:

$ kubectl rollout restart statefulset --selector app.kubernetes.io/instance=dumbo

To wait for all Pods to be running again:

$ kubectl rollout status statefulset --selector app.kubernetes.io/instance=dumbo

Here we used the label app.kubernetes.io/instance=dumbo to select all Pods that belong to a specific HDFS Stacklet. This label is created by the operator and dumbo is the name of the HDFS Stacklet as specified in the custom resource. You can add more labels to make finer grained restarts.

Automatic restarts

The Commons Operator of the Stackable Platform may restart Pods automatically, for purposes such as ensuring that TLS certificates are up-to-date. For details, see Temporary credentials lifetime as well as the Commons Operator documentation.