Storage for data volumes
You can mount volumes where data is stored by specifying PersistentVolumeClaims for each individual role group:
dataNodes: roleGroups: default: config: resources: storage: data: capacity: 128Gi
In the above example, all DataNodes in the default group will store data (the location of
dfs.datanode.name.dir) on a
By default, in case nothing is configured in the custom resource for a certain role group, each Pod will have a
5Gi large volume mount for the data location.
Multiple storage volumes
Datanodes can have multiple disks attached to increase the storage size as well as speed. They can be of different type, e.g. HDDs or SSDs.
You can configure multiple PersistentVolumeClaims (PVCs) for the datanodes as follows:
dataNodes: roleGroups: default: config: resources: storage: data: # We need to overwrite the data pvcs coming from the default value count: 0 my-disks: count: 3 capacity: 12Ti hdfsStorageType: Disk my-ssds: count: 2 capacity: 5Ti storageClass: premium-ssd hdfsStorageType: SSD
This will create the following PVCs:
By configuring and using a dedicated StorageClass you can configure your HDFS to use local disks attached to Kubernetes nodes.
You might need to re-create the StatefulSet to apply the new PVC configuration because of this Kubernetes issue.
You can delete the StatefulSet using
Stackable operators handle resource requests in a sligtly different manner than Kubernetes. Resource requests are defined on role or group level. See Roles and role groups for details on these concepts. On a role level this means that e.g. all workers will use the same resource requests and limits. This can be further specified on role group level (which takes priority to the role level) to apply different resources.
This is an example on how to specify CPU and memory resources using the Stackable Custom Resources:
--- apiVersion: example.stackable.tech/v1alpha1 kind: ExampleCluster metadata: name: example spec: workers: # role-level config: resources: cpu: min: 300m max: 600m memory: limit: 3Gi roleGroups: # role-group-level resources-from-role: # role-group 1 replicas: 1 resources-from-role-group: # role-group 2 replicas: 1 config: resources: cpu: min: 400m max: 800m memory: limit: 4Gi
In this case, the role group
resources-from-role will inherit the resources specified on the role level. Resulting in a maximum of
3Gi memory and
600m CPU resources.
The role group
resources-from-role-group has maximum of
4Gi memory and
800m CPU resources (which overrides the role CPU resources).
|For Java products the actual used Heap memory is lower than the specified memory limit due to other processes in the Container requiring memory to run as well. Currently, 80% of the specified memory limits is passed to the JVM.|
For memory only a limit can be specified, which will be set as memory request and limit in the Container. This is to always guarantee a Container the full amount memory during Kubernetes scheduling.
If no resource requests are configured explicitly, the HDFS operator uses the following defaults:
dataNodes: roleGroups: default: config: resources: cpu: max: '4' min: '100m' storage: data: capacity: 2Gi