ADR033: Foundation for admission or conversion webhooks - CA bundle injection
-
Status: accepted
-
Deciders:
-
Andrew Kenworthy
-
Malte Sander
-
Sascha Lautenschlaeger
-
-
Date: 2024-01-09
Technical Story: https://github.com/stackabletech/issues/issues/361
Context
There are many use cases for the future development of the SDP that involve validating, mutating and conversion webhooks. Tasks that must be tackled in the near future include:
-
The proper versioning of our CRDs and the possibility of smoothly up- and downgrading between versions including breaking changes (see https://github.com/stackabletech/documentation/issues/273)
-
Fixing the commons operator initial restarting problem (see https://github.com/stackabletech/commons-operator/issues/111 and spike https://github.com/stackabletech/commons-operator/tree/spike/sts-restarter-webhook)
-
Inject logging / vector sidecar containers (mutating)
When the Kubernetes API server receives an incoming request, for example when a custom resource should be created, webhooks can intercept that request after authentication and authorization. The following steps are performed before the server finally persists the object into etcd.
-
Authentication / Authorization
-
Mutating admission
-
Schema validation
-
Validation admission
-
Conversion to storage version
The steps 2 to 4 may run in a loop if multiple mutating admissions are performed and can have side effects.
In order for the Kubernetes API server to contact webhook controllers, the webhook endpoints have to present a certificate trusted by the Kubernetes API server. This can be configured similarly for conversion and validating / mutating webhooks as follows.
Conversion webhooks CA configuration
The following snippet shows a CRD containing multiple versions (v1beta1, v1) with different schemas.
The v1beta1 is still the storage version.
Due to schema changes, any custom resource deployed as version v1 requires a conversion webhook.
The conversion.webhook.clientConfig.service points to an endpoint where the actual conversion from an applied v1 to the stored v1beta1 takes place.
The Kubernetes API server must trust the endpoint for which a certificate is provided / injected in conversion.webhook.clientConfig.caBundle.
The CA bundle in this case is injected via a Cert-Manager annotation that is discussed later in this document.
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
# name must match the spec fields below, and be in the form: <plural>.<group>
name: crontabs.example.com
annotations:
cert-manager.io/inject-ca-from-secret: default/example-conversion-webhook-ca
spec:
# group name to use for REST API: /apis/<group>/<version>
group: example.com
# list of versions supported by this CustomResourceDefinition
versions:
- name: v1beta1
# Each version can be enabled/disabled by Served flag.
served: true
# One and only one version must be marked as the storage version.
storage: true
# Each version can define its own schema when no top-level
# schema is defined.
schema:
openAPIV3Schema:
type: object
properties:
hostPort:
type: string
- name: v1
served: true
storage: false
schema:
openAPIV3Schema:
type: object
properties:
host:
type: string
port:
type: string
conversion:
# a Webhook strategy instruct API server to call an external webhook for any conversion between custom resources.
strategy: Webhook
# webhook is required when strategy is `Webhook` and it configures the webhook endpoint to be called by API server.
webhook:
# conversionReviewVersions indicates what ConversionReview versions are understood/preferred by the webhook.
# The first version in the list understood by the API server is sent to the webhook.
# The webhook must respond with a ConversionReview object in the same version it received.
conversionReviewVersions: ["v1","v1beta1"]
clientConfig:
service:
namespace: default
name: example-conversion-webhook-server
path: /crdconvert
#caBundle: will be injected from a Secret 'default/example-conversion-webhook-ca' with a 'ca.crt' data key
# either Namespaced or Cluster
scope: Namespaced
names:
# plural name to be used in the URL: /apis/<group>/<version>/<plural>
plural: crontabs
# singular name to be used as an alias on the CLI and for display
singular: crontab
# kind is normally the CamelCased singular type. Your resource manifests use this.
kind: CronTab
# shortNames allow shorter string to match your resource on the CLI
shortNames:
- ct
Validating / Mutating webhooks CA configuration
The following snippet shows a ValidatingWebhookConfiguration which is similar to a MutatingWebhookConfiguration.
The webhooks.clientConfig is configured as for the conversion webhook above.
The webhooks.clientConfig.service points to the endpoint where the webhook is served an must be trusted by the Kubernetes API server via webhooks.clientConfig.caBundle.
The CA bundle in this case is injected via a Cert-Manager annotation that is discussed later in this document.
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration # or MutatingWebhookConfiguration
metadata:
name: "example-validating-webhook-server"
annotations:
cert-manager.io/inject-ca-from-secret: default/example-conversion-webhook-ca
webhooks:
- name: my-webhook.example.com
matchPolicy: Equivalent
rules:
- operations: ['CREATE','UPDATE']
apiGroups: ['*']
apiVersions: ['*']
resources: ['*']
failurePolicy: "Ignore" # Fail-open (optional)
sideEffects: None
clientConfig:
service:
namespace: default
name: example-validation-webhook-server # or example-mutating-webhook-server
path: /validate # or /mutate
#caBundle: will be injected from a Secret 'default/example-conversion-webhook-ca' with a 'ca.crt' data key
The clientConfig.caBundle cannot be shipped by us as it will differ from cluster to cluster and must be injected at runtime.
This ADR is about how to achieve the CA bundle injection using external tools or a self-made solution via the secret-operator.
Problem Statement
The required CA bundles for the webhooks endpoints must be injected at runtime. There exist tools like Cert Manager that do exactly this, adding one of these annotations to an injectable source:
-
cert-manager.io/inject-ca-from -
cert-manager.io/inject-ca-from-secret -
cert-manager.io/inject-apiserver-ca
For example injecting a CA from a secret using Cert-Manager:
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
name: my-webhook.example.com
annotations:
cert-manager.io/inject-ca-from-secret: default/my-webhook-example-com-ca
This can be used for CRDs and conversion webhooks as well as shown in Conversion webhooks CA configuration.
The SDP should focus on one solution for CA injection, internally or externally, but strive for compatibility with as many others as possible. The normal way of configuration seems to be adding an annotation to the objects containing the CA, which can be supported in our Helm chart or solved via documentation.
For clusters without any existing manager we should provide our own, lightweight caBundle injector that can work in tandem with the secret operator.
Notes on OLM
The Operator Lifecycle Manager (OLM) is a core component of OpenShift. It is used to install, manage, and upgrade the lifecycle of all Operators and their associated services running across a cluster. It is the recommended way to install and manage operators on OpenShift. OLM is also used to install the Stackable Operator Framework.
Operators and webhooks managed by OLM are automatically injected with the CA bundle from the cluster. This is done by the OLM itself and does not require any additional configuration.
OLM patches the CustomResourceDefinition (CRD) marked as owned with the CA bundle from the cluster and mounts certificates and keys in the webhook Pods.
The keys are in the EC format (as of version 4.14 of OpenShift).
OLM mounts the TLS key and certificate for the webhook at the following locations:
-
The TLS certificate file is mounted to the deployment at
/apiserver.local.config/certificates/apiserver.crt. -
The TLS key file is mounted to the deployment at
/apiserver.local.config/certificates/apiserver.key.
For more details regarding OLM constraints for webhooks, see the OpenShift Container Platform documentation.
Decision Drivers
-
Generic solution to be compatible with as many external cert providers as possible to avoid vendor lock-in. This means a possible abstraction to support switching out cert providers / "backends".
-
Openshift compatibility.
-
How to activate / deactivate if e.g. no conversion webhooks should be applied? This is about how we e.g. set inject annotations via templating / Helm.
Considered Options
Cert-Manager
The cert-manager is a powerful and extensible X.509 certificate controller for Kubernetes and OpenShift workloads. It supports certificates from a variety of popular private and public Issuers (HashiCorp, Lets encrypt and many more). The cert-manager ensures that the certificates are valid and up-to-date, and will attempt to renew certificates at a configured time before expiry.
OpenShift Service CA operator
The OpenShift Service CA operator is an OpenShift ClusterOperator and contains several controllers:
-
Serving cert signer: Issues a signed serving certificate/key pair to services annotated with
service.beta.openshift.io/serving-cert-secret-namevia a secret -
ConfigMap CA bundle injector: Watches for configmaps annotated with
service.beta.openshift.io/inject-cabundle=trueand adds or updates a data item (keyservice-ca.crt) containing the PEM-encoded CA signing bundle. Consumers of the configmap can then trustservice-ca.crtin their TLS client configuration, allowing connections to services that utilize service-serving certificates. Pods referencing theservice-ca.crtin a VolumeMount will not start before the CA bundle was injected.
Lightweight self-made solution via secret-operator
This would be the Stackable internal solution to avoid any external party tools. It would work in a similar way to the OpenShift Service CA operator but would rather inject the bundles via the CSI instead of ConfigMap mounts.
Common library for cert management / injection in operator-rs
Put the cert management / injection stuff into a library in operator-rs (similar to the code in secret-operator) that we run in-process for each operator that has a webhook.
The operator / webhook code will use a common TLS utility crate which handles creation of CAs and other certificates on-the-fly.
These certificates will be rotated automatically when they expire.
The webhook server will pick up the renewed certificate without the need for a restart.
Webhooks created by the common stackable-webhook crate can also use a certificate which is not autogenerated by us and provided via a Kubernetes Secret from external sources.
For references, see:
-
the TLS utility crate PR, and
-
the webhook server PR.
Pros and Cons of the Options
Cert-Manager
-
Good, because covers both Kubernetes and Openshift
-
Good, because cert injection works via annotations (compatibility) for various Kubernetes Resources (CRDs, Validating/Mutating webhooks)
-
Good, because it is already popular and familiar to users
-
Good, because it frees maintenance and development resources for us
-
Bad, because another tool we have to be experienced with, check for updates and breaking features etc.
OpenShift Service CA operator
-
Good, because cert injection works via annotations (compatibility)
-
Bad, because specific to OpenShift
-
Bad, because only injecting to ConfigMaps
-
Bad, because another tool we have to be experienced with, check for updates and breaking features etc.
Lightweight self-made solution via secret-operator
-
Good, because no external tools are required
-
Good, because reusing secret-operator and cert management should be an internal part of the SDP
-
Bad, because requires more time and coding
-
Bad, because secret-operator must version its own CRDs as well which could be a predicament
Common library for cert management / injection in operator-rs
-
Good, because no external tools are required
-
Good, because external tools can be used if required
-
Bad, because requires more time and coding
Decision Outcome
Chosen option Common library for cert management / injection in operator-rs, because the CA injection will be handled as part of the SDP and no external dependencies are required. The common library will reside in the operator-rs and be used in every operator. OpenShift should not pose a problem since no extra components are necessary. The required infrastructure (e.g. operator-templating, ca injection / generation) can be developed in parallel to the operators e.g. version conversion logic.