Stacklet

A Stacklet is a deployed product that is managed by a Stackable operator. The running instance is made up of multiple pieces of software called roles and can be further subdivided into role groups.

A Stacklet is defined by a Kubernetes resource that is managed by an operator, for example a DruidCluster, TrinoCluster or SparkApplication. The associated operator then creates a number of Kubernetes resources that are needed to run, configure and expose the product. For every operator (for example Stackable Operator for Apache ZooKeeper) the documentation describes the Stacklet it manages and the resources that it creates. The collection of all the resources that make up the functioning product is the Stacklet.

The products are usually deployed with StatefulSets from which Pods are created. Configuration is done with ConfigMaps, and the the software is exposed using Services. To allow for easier connectivity between Stacklets, some operators also deploy a Service discovery ConfigMap for Stacklets they manage.

Watch out for name collisions when deploying multiple Stacklets in the same namespace! Even though the resource might be different (TrinoCluster, HbaseCluster), there is a name conflict for the discovery ConfigMap if two Stacklets in the same Kubernetes namespace share a name. It is best to always use unique names for Stacklets.

Have a look at the example at the end of the page for a YAML example of a HdfsCluster Stacklet.

Roles

Roles are the core of a Stacklet, and every Stacklet will have at least one role defined. The roles are the various components that make up the running product. They are called this way, because under the hood, all roles use the same container image, but running with different parameters; sam image, different role. Some products have only one running component (and therefore one role) like Apache Kafka, Apache NiFi, Apache ZooKeeper or OPA while others need multiple different roles, often a combination of worker and coordinator processes, sometimes with a gateway or router process like Apache Airflow, Trino or Apache Druid.

The different roles have different configuration and scheduling requirements; for example some roles are resource intensive, some require multiple replicas for consistency/availability/partition-tolerance, and for others, only a single replica is enough. Some configuration options only make sense when applied to a particular role.

For example, you only need a single coordinator process with little compute and memory, but many worker processes that then need more compute and memory.

Role groups

Role groups are a further subdivision of roles, that allow more fine grained control over subsets of replicas that all belong to the same role. This is useful again to configure scheduling, i.e. have two role groups run in different regions, or run on different classes of nodes (eg: faster CPUs, faster disk access, GPU). Role groups are a flexible mechanism that you can adapt to whatever is useful for you, but you can also use just a single role group per role - often this is then called default by convention.

Configuration settings can be made at role and role group level. Any configuration made at role group level overrides role level settings, as it is more specific.

Example

HDFS uses three distinct components that work together to fulfill its duty: NameNodes, DataNodes and JournalNodes. With roles you can specify different configuration settings for each of these.

apiVersion: hdfs.stackable.tech/v1alpha1
kind: HdfsCluster
metadata:
  name: simple-hdfs
spec:
  journalNodes:
    roleGroups:
      default:
        replicas: 3  (1)
  nameNodes:
    roleGroups:
      default:
        replicas: 3
  dataNodes:
    config:
      resources:
        storage:
          data:
            capacity: 1Gi  (2)
    roleGroups:
      default:
        replicas: 2
      highCapacity:  (3)
        config:
          resources:
            storage:
              data: 2Gi
        replicas: 1
  ...
1 The JournalNode role with only a single default role group. For the role group 3 replicas are specified, specifying a replica count is optional, the default is 1.
2 A common config setting for the DataNodes role. This configuration setting applies only to Pods for this role.
3 The DataNode role has two role groups, the default group and the highCapacity group. The highCapacity group settings override the role defaults with a higher storage value of 2Gi and only one replica.