Cluster operations
Stackable operators offer different cluster operations to control the reconciliation process. This is useful when updating operators, debugging or testing of new settings:
-
reconciliationPaused
- Stop the operator from reconciling the cluster spec. The status will still be updated. -
stopped
- Stop all running Pods but keep updating all deployed resources like ConfigMaps, Services and the cluster status.
If not specified, clusterOperation.reconciliationPaused
and clusterOperation.stopped
default to false
.
When Furthermore, if you create a stacklet where |
When setting This means the cluster will stop reconciling immediately and the To avoid this, the cluster should first be stopped and then paused. |
Example
---
apiVersion: mycluster.stackable.tech/v1alpha1
kind: MyCluster
metadata:
name: my-cluster
spec:
clusterOperation:
reconciliationPaused: false (1)
stopped: false (2)
1 | The clusterOperation.reconciliationPaused flag set to true stops the operator from reconciling any changes to the cluster spec. The cluster status is still updated. |
2 | The clusterOperation.stopped flag set to true stops all pods in the cluster. This is done by setting all deployed StatefulSet replicas to 0. |
Example usage (updating operator without downtime)
One example usage of the reconciliationPaused
feature is to update your operator without all deployed stacklets restarting simultaneously due to the changes the new operator version will apply.
-
Disable reconciliation for e.g. ZookeeperCluster
Execute the following command for every stacklet that should not be restarted by the operator update:
$ kubectl patch zookeepercluster/simple-zk --patch '{"spec": {"clusterOperation": {"reconciliationPaused": true}}}' --type=merge
-
Update operator
$ stackablectl operator uninstall zookeeper $ # Replace CRD with new version, e.g. kubectl replace -f https://raw.githubusercontent.com/stackabletech/zookeeper-operator/24.7.0/deploy/helm/zookeeper-operator/crds/crds.yaml $ stackablectl operator install zookeeper=24.7.0 # choose your version
-
No Zookeeper Pods have been restarted, they are still using the old image.
-
Enable reconciliation again
You can do this step by step for every stacklet you have, so that they will not restart simultaneously
$ kubectl patch zookeepercluster/simple-zk --patch '{"spec": {"clusterOperation": {"reconciliationPaused": false}}}' --type=merge
-
Zookeeper Pods will restart and pull in the new image
Service restarts
Manual restarts
Sometimes it is necessary to restart services deployed in Kubernetes. A service restart should induce as little disruption as possible, ideally none.
Most operators create StatefulSet objects for the products they manage and Kubernetes offers a rollout mechanism to restart them.
You can use kubectl rollout restart statefulset
to restart a StatefulSet previously created by an operator.
To illustrate how to use the command line to restart one or more Pods, we will assume you used the Stackable HDFS Operator to deploy an HDFS Stacklet called dumbo
.
This Stacklet will consist, among other things, of three StatefulSets created for each HDFS role: namenode
, datanode
and journalnode
.
Let’s list them:
$ kubectl get statefulset -l app.kubernetes.io/instance=dumbo
NAME READY AGE
dumbo-datanode-default 2/2 4m41s
dumbo-journalnode-default 1/1 4m41s
dumbo-namenode-default 2/2 4m41s
To restart the HDFS DataNode Pods, run:
$ kubectl rollout restart statefulset dumbo-datanode-default
statefulset.apps/dumbo-datanode-default restarted
Sometimes you want to restart all Pods of a stacklet and not just individual roles. This can be achieved in a similar manner by using labels instead of StatefulSet names. Continuing with the example above, to restart all HDFS Pods you would have to run:
$ kubectl rollout restart statefulset --selector app.kubernetes.io/instance=dumbo
To wait for all Pods to be running again:
$ kubectl rollout status statefulset --selector app.kubernetes.io/instance=dumbo
Here we used the label app.kubernetes.io/instance=dumbo
to select all Pods that belong to a specific HDFS Stacklet.
This label is created by the operator and dumbo
is the name of the HDFS Stacklet as specified in the custom resource.
You can add more labels to make finer grained restarts.
Automatic restarts
The Commons Operator of the Stackable Platform may restart Pods automatically, for purposes such as ensuring that TLS certificates are up-to-date. For details, see Temporary credentials lifetime as well as the Commons Operator documentation.