ADR009: Assigning Services to Nodes

Status: accepted
Deciders:
- Lars Francke
- Sönke Liebau
- Malte Sander
Date: 1.3.2021

Context and Problem Statement

We need to decide on a syntax and feature set on how to assign services to nodes. For every process a user wants to run in the cluster we need to find a node to run it on. Usually users have an opinion which nodes make good candidates and we need to support a syntax that is flexible enough that users can select the nodes they want to target.

Our competition currently requires users to manually select all machines that software should be deployed to. Our goal is to be at least as good if not better.

Problems

We want to support multiple instances of a service (or role) on a single node.
We have to be able to target single instances with one-shot commands (e.g. restart this process)
We need to be able to place instances in a very flexible way but also in a very strict way
- Sometimes there is an agreed upon fixed placement of services to nodes that must be adhered to

Considered Options

Have users specify a list of nodeNames in their resources
- This would be the scenario customers of current Big Data distributions are used to
Reuse the "Label and Selector" syntax from Kubernetes
- See also Assigning Pods to Nodes

Decision

We decided to go with the Kubernetes way of using label selectors.

Example:

 leader:
    selectors:
      default:
        selector:
          matchLabels:
            component: spark
          matchExpressions:
            - { key: tier, operator: In, values: [ cache ] }
            - { key: environment, operator: NotIn, values: [ dev ] }
        config:
          cores: 1
          memory: "1g"
        instances: 3
        instancesPerNode: 1
      20core:
        selector:
          matchLabels:
            component: spark
            cores: 20
          matchExpressions:
            - { key: tier, operator: In, values: [ cache ] }
            - { key: environment, operator: NotIn, values: [ dev ] }
          config:
            cores: 10
            memory: "1g"
          instances: 3
          instancesPerNode: 2
    config:

Here we see that for a node type called leader we have an object with two top-level properties, one of them being selectors. selectors is again a map of names to objects.
The names are chosen by the user and should describe the role group, often this will be just default or something similar but it allows overriding configuration per role group while inheriting parent-level config.

In summary, we decided that all our CRDs should follow a consistent pattern:

At the top level there is one property per role
Under each role there can be one or more role groups (in a selectors property)
Each role group has a single selector which is a Kubernetes LabelSelector
Each role group can have optional instances and instancesPerNode fields

This method also allows for (optional) auto-scaling should new nodes be added that match the selector. The details of how this can be implemented are not in scope for this ADR.

Positive Consequences

We have a consistent behavior across all Operators
It is better than what customers have today
People already familiar with Kubernetes should feel right at home

Negative Consequences

It is more effort to implement than the alternatives
It is harder to use for everyone who’s not used to the Kubernetes style of selecting nodes
- The old way of listing hosts can be relatively easily emulated (but should be explained in the docs)