ADR033: Foundation for admission or conversion webhooks - CA bundle injection

Status: accepted
Deciders:
- Andrew Kenworthy
- Malte Sander
- Sascha Lautenschlaeger
Date: 2024-01-09

Technical Story: https://github.com/stackabletech/issues/issues/361

Context

There are many use cases for the future development of the SDP that involve validating, mutating and conversion webhooks. Tasks that must be tackled in the near future include:

The proper versioning of our CRDs and the possibility of smoothly up- and downgrading between versions including breaking changes (see https://github.com/stackabletech/documentation/issues/273)
Fixing the commons operator initial restarting problem (see https://github.com/stackabletech/commons-operator/issues/111 and spike https://github.com/stackabletech/commons-operator/tree/spike/sts-restarter-webhook)
Inject logging / vector sidecar containers (mutating)

When the Kubernetes API server receives an incoming request, for example when a custom resource should be created, webhooks can intercept that request after authentication and authorization. The following steps are performed before the server finally persists the object into etcd.

Authentication / Authorization
Mutating admission
Schema validation
Validation admission
Conversion to storage version

The steps 2 to 4 may run in a loop if multiple mutating admissions are performed and can have side effects.

In order for the Kubernetes API server to contact webhook controllers, the webhook endpoints have to present a certificate trusted by the Kubernetes API server. This can be configured similarly for conversion and validating / mutating webhooks as follows.

Conversion webhooks CA configuration

The following snippet shows a CRD containing multiple versions (v1beta1, v1) with different schemas. The v1beta1 is still the storage version. Due to schema changes, any custom resource deployed as version v1 requires a conversion webhook.

The conversion.webhook.clientConfig.service points to an endpoint where the actual conversion from an applied v1 to the stored v1beta1 takes place. The Kubernetes API server must trust the endpoint for which a certificate is provided / injected in conversion.webhook.clientConfig.caBundle. The CA bundle in this case is injected via a Cert-Manager annotation that is discussed later in this document.

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  # name must match the spec fields below, and be in the form: <plural>.<group>
  name: crontabs.example.com
  annotations:
    cert-manager.io/inject-ca-from-secret: default/example-conversion-webhook-ca
spec:
  # group name to use for REST API: /apis/<group>/<version>
  group: example.com
  # list of versions supported by this CustomResourceDefinition
  versions:
  - name: v1beta1
    # Each version can be enabled/disabled by Served flag.
    served: true
    # One and only one version must be marked as the storage version.
    storage: true
    # Each version can define its own schema when no top-level
    # schema is defined.
    schema:
      openAPIV3Schema:
        type: object
        properties:
          hostPort:
            type: string
  - name: v1
    served: true
    storage: false
    schema:
      openAPIV3Schema:
        type: object
        properties:
          host:
            type: string
          port:
            type: string
  conversion:
    # a Webhook strategy instruct API server to call an external webhook for any conversion between custom resources.
    strategy: Webhook
    # webhook is required when strategy is `Webhook` and it configures the webhook endpoint to be called by API server.
    webhook:
      # conversionReviewVersions indicates what ConversionReview versions are understood/preferred by the webhook.
      # The first version in the list understood by the API server is sent to the webhook.
      # The webhook must respond with a ConversionReview object in the same version it received.
      conversionReviewVersions: ["v1","v1beta1"]
      clientConfig:
        service:
          namespace: default
          name: example-conversion-webhook-server
          path: /crdconvert
        #caBundle: will be injected from a Secret 'default/example-conversion-webhook-ca' with a 'ca.crt' data key
  # either Namespaced or Cluster
  scope: Namespaced
  names:
    # plural name to be used in the URL: /apis/<group>/<version>/<plural>
    plural: crontabs
    # singular name to be used as an alias on the CLI and for display
    singular: crontab
    # kind is normally the CamelCased singular type. Your resource manifests use this.
    kind: CronTab
    # shortNames allow shorter string to match your resource on the CLI
    shortNames:
    - ct

Validating / Mutating webhooks CA configuration

The following snippet shows a ValidatingWebhookConfiguration which is similar to a MutatingWebhookConfiguration. The webhooks.clientConfig is configured as for the conversion webhook above. The webhooks.clientConfig.service points to the endpoint where the webhook is served an must be trusted by the Kubernetes API server via webhooks.clientConfig.caBundle. The CA bundle in this case is injected via a Cert-Manager annotation that is discussed later in this document.

apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration # or MutatingWebhookConfiguration
metadata:
  name: "example-validating-webhook-server"
  annotations:
    cert-manager.io/inject-ca-from-secret: default/example-conversion-webhook-ca
webhooks:
  - name: my-webhook.example.com
    matchPolicy: Equivalent
    rules:
      - operations: ['CREATE','UPDATE']
        apiGroups: ['*']
        apiVersions: ['*']
        resources: ['*']
    failurePolicy: "Ignore" # Fail-open (optional)
    sideEffects: None
    clientConfig:
      service:
        namespace: default
        name: example-validation-webhook-server # or example-mutating-webhook-server
        path: /validate # or /mutate
      #caBundle: will be injected from a Secret 'default/example-conversion-webhook-ca' with a 'ca.crt' data key

The clientConfig.caBundle cannot be shipped by us as it will differ from cluster to cluster and must be injected at runtime.

This ADR is about how to achieve the CA bundle injection using external tools or a self-made solution via the secret-operator.

Problem Statement

The required CA bundles for the webhooks endpoints must be injected at runtime. There exist tools like Cert Manager that do exactly this, adding one of these annotations to an injectable source:

cert-manager.io/inject-ca-from
cert-manager.io/inject-ca-from-secret
cert-manager.io/inject-apiserver-ca

For example injecting a CA from a secret using Cert-Manager:

apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
  name: my-webhook.example.com
  annotations:
    cert-manager.io/inject-ca-from-secret: default/my-webhook-example-com-ca

This can be used for CRDs and conversion webhooks as well as shown in Conversion webhooks CA configuration.

The SDP should focus on one solution for CA injection, internally or externally, but strive for compatibility with as many others as possible. The normal way of configuration seems to be adding an annotation to the objects containing the CA, which can be supported in our Helm chart or solved via documentation.

For clusters without any existing manager we should provide our own, lightweight caBundle injector that can work in tandem with the secret operator.

Notes on OLM

The Operator Lifecycle Manager (OLM) is a core component of OpenShift. It is used to install, manage, and upgrade the lifecycle of all Operators and their associated services running across a cluster. It is the recommended way to install and manage operators on OpenShift. OLM is also used to install the Stackable Operator Framework.

Operators and webhooks managed by OLM are automatically injected with the CA bundle from the cluster. This is done by the OLM itself and does not require any additional configuration.

OLM patches the CustomResourceDefinition (CRD) marked as owned with the CA bundle from the cluster and mounts certificates and keys in the webhook Pods. The keys are in the EC format (as of version 4.14 of OpenShift).

OLM mounts the TLS key and certificate for the webhook at the following locations:

The TLS certificate file is mounted to the deployment at /apiserver.local.config/certificates/apiserver.crt.
The TLS key file is mounted to the deployment at /apiserver.local.config/certificates/apiserver.key.

For more details regarding OLM constraints for webhooks, see the OpenShift Container Platform documentation.

Decision Drivers

Generic solution to be compatible with as many external cert providers as possible to avoid vendor lock-in. This means a possible abstraction to support switching out cert providers / "backends".
Openshift compatibility.
How to activate / deactivate if e.g. no conversion webhooks should be applied? This is about how we e.g. set inject annotations via templating / Helm.

Considered Options

Cert-Manager

The cert-manager is a powerful and extensible X.509 certificate controller for Kubernetes and OpenShift workloads. It supports certificates from a variety of popular private and public Issuers (HashiCorp, Lets encrypt and many more). The cert-manager ensures that the certificates are valid and up-to-date, and will attempt to renew certificates at a configured time before expiry.

OpenShift Service CA operator

The OpenShift Service CA operator is an OpenShift ClusterOperator and contains several controllers:

Serving cert signer: Issues a signed serving certificate/key pair to services annotated with service.beta.openshift.io/serving-cert-secret-name via a secret
ConfigMap CA bundle injector: Watches for configmaps annotated with service.beta.openshift.io/inject-cabundle=true and adds or updates a data item (key service-ca.crt) containing the PEM-encoded CA signing bundle. Consumers of the configmap can then trust service-ca.crt in their TLS client configuration, allowing connections to services that utilize service-serving certificates. Pods referencing the service-ca.crt in a VolumeMount will not start before the CA bundle was injected.

Lightweight self-made solution via secret-operator

This would be the Stackable internal solution to avoid any external party tools. It would work in a similar way to the OpenShift Service CA operator but would rather inject the bundles via the CSI instead of ConfigMap mounts.

Common library for cert management / injection in operator-rs

Put the cert management / injection stuff into a library in operator-rs (similar to the code in secret-operator) that we run in-process for each operator that has a webhook. The operator / webhook code will use a common TLS utility crate which handles creation of CAs and other certificates on-the-fly. These certificates will be rotated automatically when they expire. The webhook server will pick up the renewed certificate without the need for a restart. Webhooks created by the common stackable-webhook crate can also use a certificate which is not autogenerated by us and provided via a Kubernetes Secret from external sources. For references, see:

Pros and Cons of the Options

Cert-Manager

Good, because covers both Kubernetes and Openshift
Good, because cert injection works via annotations (compatibility) for various Kubernetes Resources (CRDs, Validating/Mutating webhooks)
Good, because it is already popular and familiar to users
Good, because it frees maintenance and development resources for us
Bad, because another tool we have to be experienced with, check for updates and breaking features etc.

OpenShift Service CA operator

Good, because cert injection works via annotations (compatibility)
Bad, because specific to OpenShift
Bad, because only injecting to ConfigMaps
Bad, because another tool we have to be experienced with, check for updates and breaking features etc.

Lightweight self-made solution via secret-operator

Good, because no external tools are required
Good, because reusing secret-operator and cert management should be an internal part of the SDP
Bad, because requires more time and coding
Bad, because secret-operator must version its own CRDs as well which could be a predicament

Common library for cert management / injection in operator-rs

Good, because no external tools are required
Good, because external tools can be used if required
Bad, because requires more time and coding

Decision Outcome

Chosen option Common library for cert management / injection in operator-rs, because the CA injection will be handled as part of the SDP and no external dependencies are required. The common library will reside in the operator-rs and be used in every operator. OpenShift should not pose a problem since no extra components are necessary. The required infrastructure (e.g. operator-templating, ca injection / generation) can be developed in parallel to the operators e.g. version conversion logic.

Positive Consequences

No external dependencies
CA bundle injection as basic part of the SDP
Opt-out (e.g. removing the conversion webhook from CRD) possible