Stackable Operator for Apache Kafka

The Stackable operator for Apache Kafka is an operator that can deploy and manage Apache Kafka clusters on Kubernetes. Apache Kafka is a distributed streaming platform designed to handle large volumes of data in real-time. It is commonly used for real-time data processing, data ingestion, event streaming, and messaging between applications.

Getting started

Follow the Getting started which guides you through installing The Stackable Kafka and ZooKeeper operators, setting up ZooKeeper and Kafka and testing your Kafka installation.

Resources

The KafkaCluster custom resource contains your Kafka cluster configuration. It defines a single broker role.

For every role group in the broker role the operator creates a StatefulSet. Multiple Services are created - one at role level, one per role group as well as one for every individual Pod - to allow access to the entire Kafka cluster, parts of it or just individual brokers.

For every StatefulSet, a ConfigMap is deployed containing logging properties and a Kafka configuration file which is derived from the KafkaCluster resource.

The operator creates a Service discovery ConfigMap for the whole KafkaCluster which references the Service for the whole cluster. Other operators use this ConfigMap to connect to a Kafka cluster simply by name and it can also be used by custom third party applications to find the connection endpoint.

Dependencies

Kafka requires Apache ZooKeeper for coordination purposes (it will not be needed in the future as it will be replaced with a built-in solution).

Connections to other products

Since Kafka often takes on a bridging role, many other products connect to it. In the demos below you find example data pipelines that use Apache NiFi with the Stackable operator to write to Kafka and Apache Druid with the Stackable operator to read from Kafka. But you can also connect using Apache Spark or with a custom Job written in various languages.

Demos

stackablectl supports installing Demos with a single command. The demos are complete data piplines which showcase multiple components of the Stackable platform working together and which you can try out interactively. Both demos below inject data into Kafka using NiFi and read from the Kafka topics using Druid.

Waterlevel demo

The nifi-kafka-druid-water-level-data demo uses data from PEGELONLINE to visualize water levels in rivers and coastal regions of Germany from historic and real time data.

Earthquake demo

The nifi-kafka-druid-earthquake-data demo ingests earthquake data into a similar pipeline as is used in the waterlevel demo.

Supported versions

The Stackable operator for Apache Kafka currently supports the Kafka versions listed below. To use a specific Kafka version in your KafkaCluster, you have to specify an image - this is explained in the Product image selection documentation. The operator also supports running images from a custom registry or running entirely customized images; both of these cases are explained under Product image selection as well.

4.2.1 (experimental, deprecated) - Requires KRaft, please read on the Kraft migration guide.
4.1.1 (experimental, deprecated) - Requires KRaft, please read on the Kraft migration guide.
3.9.2 (LTS)
3.9.1 (deprecated)

Support for clusters running in Kraft mode (which includes Apache Kafka 4.x.x) is experimental because it has not been thoroughly tested in production environments yet.

Also there are some known issues such as:

Controller scaling is not reliable.
Kerberos authentication is not tested yet.
Service exposition is not definitive.

Useful links

The kafka-operator GitHub repository
The operator feature overview in the feature tracker
The KafkaCluster CRD documentation