Stackable Operator for Apache Kafka

The Stackable Operator for Apache Kafka is an operator that can deploy and manage Apache Kafka clusters on Kubernetes. Apache Kafka is a distributed streaming platform designed to handle large volumes of data in real-time. It is commonly used for real-time data processing, data ingestion, event streaming, and messaging between applications.

Getting started

Follow the Getting started which will guide you through installing The Stackable Kafka and ZooKeeper Operators, setting up ZooKeeper and Kafka and testing your Kafka using kcat.

Resources

The KafkaCluster custom resource contains your Kafka cluster configuration. It defines a single broker role.

A diagram depicting the Kubernetes resources created by the operator.

For every role group in the broker role the Operator creates a StatefulSet. Multiple Services are created - one at role level, one per role group as well as one for every individual Pod - to allow access to the entire Kafka cluster, parts of it or just individual brokers.

For every StatefulSet (role group) a ConfigMap is deployed containing a log4j.properties file for logging configuration and a server.properties file containing the whole Kafka configuration which is derived from the KafkaCluster resource.

The Operator creates a Service discovery ConfigMap for the whole KafkaCluster which references the Service for the whole cluster. Other operators use this ConfigMap to connect to a Kafka cluster simply by name and it can also be used by custom third party applications to find the connection endpoint.

Dependencies

Kafka requires Apache ZooKeeper for coordination purposes (it will not be needed in the future as it will be replaced with a built-in solution).

Connections to other products

Since Kafka often takes on a bridging role, many other products connect to it. In the demos below you will find example data pipelines that use Apache NiFi with the Stackable Operator to write to Kafka and Apache Druid with the Stackable Operator to read from Kafka. But you can also connect using Apache Spark or with a custom Job written in various languages.

Demos

stackablectl supports installing Demos with a single command. The demos are complete data piplines which showcase multiple components of the Stackable platform working together and which you can try out interactively. Both demos below inject data into Kafka using NiFi and read from the Kafka topics using Druid.

Waterlevel Demo

The nifi-kafka-druid-water-level-data demo uses data from PEGELONLINE to visualize water levels in rivers and coastal regions of Germany from historic and real time data.

Earthquake Demo

The nifi-kafka-druid-earthquake-data demo ingests earthquake data into a similar pipeline as is used in the waterlevel demo.

Supported Versions

The Stackable Operator for Apache Kafka currently supports the following versions of Kafka:

  • 3.5.1

  • 3.4.1

  • 3.4.0 (deprecated)

  • 2.8.2

  • 2.8.1 (deprecated)