PersistentVolumeClaim usage

Several of the tools on the Stackable platform can use external resources that the cluster administrator makes available via a PersistentVolume. Airflow users can access DAG jobs this way, and Spark users can do the same for data or other job dependencies, to name just two examples.

A PersistentVolume will usually be provisioned by the Kubernetes Container Storage Interface (CSI) on behalf of the cluster administrator, who will take into account the type of storage that is required. This will include, for example, an appropriate sizing, and relevant access modes (which in turn are dependent on the StorageClass chosen to back the PersistentVolume).

The relationship between a PersistentVolume and a PersistentVolumeClaim can be illustrated by these two examples:

---
apiVersion: v1
  kind: PersistentVolume
metadata:
  name: task-pv-volume
  labels:
    type: local
spec:
  storageClassName: manual (1)
  capacity:
    storage: 10Gi (2)
  accessModes:
    - ReadWriteOnce (3)
  hostPath:
    path: "/mnt/data"
---
apiVersion: v1
  kind: PersistentVolumeClaim
metadata:
  name: task-pv-claim
spec:
  storageClassName: manual (4)
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 3Gi (5)
1 The name of the storage class, which will be used by the PersistentVolumeClaim
2 The capacity of the PersistentVolume
3 a list of access modes
4 The storageClassName which is used to match a PersistentVolume to a claim
5 The specific quantity of the resource that is being claimed

Access modes and the StorageClass

Not all storage classes support all access modes. The supported access modes also depend on the Kubernetes implementation, see for example the compatiblity table Supported access modes for PVs in the OpenShift documentation. Other managed Kubernetes implementations will be similar, albeit with different default storage class names. The important point is that the default StorageClass only supports ReadWriteOnce, which limits access to the PersistentVolumeClaim to a single node. A strategy governing PersistentVolumeClaim resources will thus address the following:

  • what access mode is appropriate? Will the resources be accessed by multiple pods and/or modes?

  • does the Kubernetes cluster have a default implementation for these access modes?

  • if access modes are restricted (e.g. ReadWriteOnce), does the cluster prioritise available resources over implicit application dependencies (in other words, is the PersistentVolumeClaim treated as a soft- or hard-dependency)?

If a PersistentVolumeClaim should be mounted on a single node for the application and its components that use it, this can be specified explicitly (see the next section).

Node selection

The Kubernetes documentation states the following with regard to assigning pods to specific nodes:

the scheduler will automatically do a reasonable placement (for example, spreading your Pods across nodes so as not place Pods on a node with insufficient free resources).

This suggests that resources are automatically considered when pods are assigned to nodes, but it is not clear if the same is true for implicit dependencies, such as PersistentVolumeClaim usage by multiple pods. The scheduler will take various factors into account, such as

…​individual and collective resource requirements, hardware / software / policy constraints, affinity and anti-affinity specifications, data locality, inter-workload interference…​.

but implementations may vary in the way soft dependencies (e.g. optimal resource usage) and hard dependencies (e.g. access modes, that may prevent the job from running) are handled and prioritised.

Test considerations

For PersistentVolumeClaim-relevant tests in the Stackable operator repositories the backing PersistentVolume is omitted as this is an implementation decision to be made by the cluster administrator and mocking e.g. an NFS volume for tests is non-trivial.

If the only viable access mode is ReadWriteOnce (see above) - meaning that all test steps dependent on a PersistentVolumeClaim should be run on the same node - this assignment should be made explicitly with a declaration of either a node selector or pod-affinity.

Managed Kubernetes clusters will normally have a default storage implementation for access modes other than ReadWriteOnce so e.g. ReadWriteMany can be declared for tests running against such clusters in the knowledge that the appropriate storage will be used.

Operator usage

Spark-k8s

Users of the Spark-k8s operator have a variety of ways to manage SparkApplication dependencies, one of which is to mount resources on a PersistentVolumeClaim. An example is shown here.