KRaft mode (experimental)

The Kafka KRaft mode is currently experimental, and subject to change.

Apache Kafka’s KRaft mode replaces Apache ZooKeeper with Kafka’s own built-in consensus mechanism based on the Raft protocol. This simplifies Kafka’s architecture, reducing operational complexity by consolidating cluster metadata management into Kafka itself.

The Stackable Operator for Apache Kafka currently does not support automatic cluster upgrades from Apache ZooKeeper to KRaft.

Overview

  • Introduced: Kafka 2.8.0 (early preview, not production-ready).

  • Matured: Kafka 3.3.x (production-ready, though ZooKeeper is still supported).

  • Default & Recommended: Kafka 3.5+ strongly recommends KRaft for new clusters.

  • Full Replacement: Kafka 4.0.0 (2025) removes ZooKeeper completely.

  • Migration: Tools exist to migrate from ZooKeeper to KRaft, but new deployments should start with KRaft.

Configuration

The Stackable Kafka operator introduces a new role in the KafkaCluster CRD called KRaft Controller. Configuring the Controller will put Kafka into KRaft mode. Apache ZooKeeper will not be required anymore.

apiVersion: kafka.stackable.tech/v1alpha1
kind: KafkaCluster
metadata:
  name: kafka
spec:
  image:
    productVersion: "3.9.1"
  brokers:
    roleGroups:
      default:
        replicas: 1
  controllers:
    roleGroups:
      default:
        replicas: 3
Using spec.controllers is mutually exclusive with spec.clusterConfig.zookeeperConfigMapName.

Recommendations

A minimal KRaft setup consisting of at least 3 Controllers has the following resource requirements:

  • 600m CPU request

  • 3000m CPU limit

  • 3000Mi memory request and limit

  • 6Gi persistent storage

The Controller replicas should sum up to an odd number for the Raft consensus.

Resources

Corresponding to the values above, the operator uses the following resource defaults:

controllers:
  config:
    resources:
      memory:
        limit: 1Gi
      cpu:
        min: 250m
        max: 1000m
      storage:
        logDirs:
          capacity: 2Gi

Overrides

The configuration of overrides, JVM arguments etc. is similar to the Broker and documented on the concepts page.

Internal operator details

KRaft mode requires major configuration changes compared to ZooKeeper:

  • cluster-id: This is set to the metadata.name of the KafkaCluster resource during initial formatting

  • node.id: This is a calculated integer, hashed from the role and rolegroup and added replica id.

  • process.roles: Will always only be broker or controller. Mixed broker,controller servers are not supported.

  • The operator configures a static voter list containing the controller pods. Controllers are not dynamically managed.

Known Issues

  • Automatic migration from Apache ZooKeeper to KRaft is not supported.

  • Scaling controller replicas might lead to unstable clusters.

  • Kerberos is currently not supported for KRaft in all versions.

Troubleshooting

Cluster does not start

Check that at least a quorum (majority) of controllers are reachable.

Frequent leader elections

Likely caused by controller resource starvation or unstable Kubernetes scheduling.

Migration issues (ZooKeeper to KRaft)

Ensure Kafka version 3.9.x and higher and follow the official migration documentation. The Stackable Kafka operator currently does not support the migration.

Scaling issues

The Dynamic scaling is only supported from Kafka version 3.9.0. If you are using older versions, automatic scaling may not work properly (e.g. adding or removing controller replicas).

Kraft migration guide

The operator version 26.3.0 adds support for migrating Kafka clusters from ZooKeeper to KRaft mode.

This guide describes the steps required to migrate an existing Kafka cluster managed by the Stackable Kafka operator from ZooKeeper to KRaft mode.

Before starting the migration we recommend to reduce producer/consumer operations to a minimum or even pause them completely if possible to reduce the risk of data loss during the migration.

To make the migration step as clear as possible, we’ll use a complete working example throughout this guide. The example cluster will be kept minimal without any additional configuration.

We’ll use Kafka version 3.9.1 for this purpose. This is because this is the last version from the 3.x Kafka series that runs on ZooKeeper mode and is supported by the SDP.

We’ll also assign broker ids manually from the beginning to simplify this guide. In a real-workd scenario, you do not have this option at this step because your cluster is already running. In a real world-scenario you’ll have to collect these ids and configure manual assignment at the second step of the migration.

We start by creating a dedicated namespace to work in and deploy the Kafka cluster including ZooKeeper and credentials.

---
apiVersion: v1
kind: Namespace
metadata:
  labels:
    stackable.tech/vendor: Stackable
  name: kraft-migration
---
apiVersion: zookeeper.stackable.tech/v1alpha1
kind: ZookeeperCluster
metadata:
  name: simple-zk
  namespace: kraft-migration
spec:
  image:
    productVersion: 3.9.4
    pullPolicy: IfNotPresent
  servers:
    roleGroups:
      default:
        replicas: 1
---
apiVersion: zookeeper.stackable.tech/v1alpha1
kind: ZookeeperZnode
metadata:
  name: simple-kafka-znode
  namespace: kraft-migration
spec:
  clusterRef:
    name: simple-zk
---
apiVersion: secrets.stackable.tech/v1alpha1
kind: SecretClass
metadata:
  name: kafka-internal-tls
spec:
  backend:
    autoTls:
      ca:
        secret:
          name: secret-provisioner-kafka-internal-tls-ca
          namespace: kraft-migration
        autoGenerate: true
---
apiVersion: authentication.stackable.tech/v1alpha1
kind: AuthenticationClass
metadata:
  name: kafka-client-auth-tls
spec:
  provider:
    tls:
      clientCertSecretClass: kafka-client-auth-secret
---
apiVersion: secrets.stackable.tech/v1alpha1
kind: SecretClass
metadata:
  name: kafka-client-auth-secret
spec:
  backend:
    autoTls:
      ca:
        secret:
          name: secret-provisioner-tls-kafka-client-ca
          namespace: kraft-migration
        autoGenerate: true
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: broker-ids
  namespace: kraft-migration
data:
  simple-kafka-broker-default-0: "2000"
  simple-kafka-broker-default-1: "2001"
  simple-kafka-broker-default-2: "2002"
---
apiVersion: kafka.stackable.tech/v1alpha1
kind: KafkaCluster
metadata:
  name: simple-kafka
  namespace: kraft-migration
spec:
  image:
    productVersion: 3.9.1
    pullPolicy: IfNotPresent
  clusterConfig:
    metadataManager: zookeeper
    brokerIdPodConfigMapName: broker-ids
    authentication:
      - authenticationClass: kafka-client-auth-tls
    tls:
      internalSecretClass: kafka-internal-tls
      serverSecretClass: tls
    zookeeperConfigMapName: simple-kafka-znode
  brokers:
    roleGroups:
      default:
        replicas: 3

Next, from one of the broker pods, we will create a topic called kraft-migration-topic with 3 partitions to verify the migration later.

$ /stackable/kafka/bin/kafka-topics.sh \
--create \
--command-config /stackable/config/client.properties \
--bootstrap-server simple-kafka-broker-default-0-listener-broker.kraft-migration.svc.cluster.local:9093 \
--partitions 3 \
--topic kraft-migration-topic

And - also from one of the broker pods - publish some test messages to it:

$ /stackable/kafka/bin/kafka-producer-perf-test.sh \
--producer.config /stackable/config/client.properties \
--producer-props bootstrap.servers=simple-kafka-broker-default-0-listener-broker.kraft-migration.svc.cluster.local:9093 \
--topic kraft-migration-topic \
--payload-monotonic \
--throughput 1 \
--num-records 100

We now have a working Kafka cluster with ZooKeeper and some test data.

1. Start Kraft controllers

In this step we will perform the following actions:

  1. Retrieve the current cluster.id as generated by Kafka.

  2. Retrieve and store the current broker ids.

  3. Update the KafkaCluster resource to add spec.controllers property.

  4. Configure the controllers to run in migration mode.

  5. Apply the changes and wait for all cluster pods to become ready.

We can obtain the current cluster.id either by inspecting the ZooKeeper data or from meta.properties file on one of the brokers.

$ kubectl -n kraft-migration exec -c kafka simple-kafka-broker-default-0 -- cat /stackable/data/topicdata/meta.properties | grep cluster.id
cluster.id=MyCya7hbTD-Hay8PgCsCYA

We add this value to the KAFKA_CLUSTER_ID environment variable for both brokers and controllers.

To be able to migrate the existing brokers, we need to preserve their broker ids. Similarly to the cluster id, we can obtain the broker ids from the meta.properties file on each broker pod.

$ kubectl -n kraft-migration exec -c kafka simple-kafka-broker-default-0 -- cat /stackable/data/topicdata/meta.properties | grep broker.id
broker.id=2000

We then need to inform the operator to use these ids instead of generating new ones. This is done by creating a configmap map containing the id mapping and pointing the spec.clusterProperties.brokerIdPodConfigMapName property of the KafkaCluster resource to it.

These two properties must be preserverd for the rest of the migration process and the lifetime of the cluster.

The complete example KafkaCluster resource after applying the required changes looks as follows:

---
apiVersion: kafka.stackable.tech/v1alpha1
kind: KafkaCluster
metadata:
  name: simple-kafka
  namespace: kraft-migration
spec:
  image:
    productVersion: 3.9.1
    pullPolicy: IfNotPresent
  clusterConfig:
    metadataManager: zookeeper
    authentication:
      - authenticationClass: kafka-client-auth-tls
    tls:
      internalSecretClass: kafka-internal-tls
      serverSecretClass: tls
    zookeeperConfigMapName: simple-kafka-znode
    brokerIdPodConfigMapName: broker-ids
  brokers:
    envOverrides:
      KAFKA_CLUSTER_ID: "MyCya7hbTD-Hay8PgCsCYA"
    roleGroups:
      default:
        replicas: 3
  controllers:
    roleGroups:
      default:
        replicas: 3
    envOverrides:
      KAFKA_CLUSTER_ID: "MyCya7hbTD-Hay8PgCsCYA"
    configOverrides:
      controller.properties:
        zookeeper.metadata.migration.enable: "true" # Enable migration mode so the controller can read metadata from ZooKeeper.

We kubectl apply the updated resource and wait for brokers and controllers to become ready.

2. Migrate metadata

In this step we will perform the following actions:

  1. Configure the controller quorum on the brokers.

  2. Enable metadata migration mode on the brokers.

  3. Apply the changes and restart the broker pods.

To start the metadata migration, we need to add the zookeeper.metadata.migration.enable: "true" and controller quorum configuration to the broker configuration.

For this step, the complete example KafkaCluster resource looks as follows:

---
apiVersion: kafka.stackable.tech/v1alpha1
kind: KafkaCluster
metadata:
  name: simple-kafka
  namespace: kraft-migration
spec:
  image:
    productVersion: 3.9.1
    pullPolicy: IfNotPresent
  clusterConfig:
    metadataManager: zookeeper
    authentication:
      - authenticationClass: kafka-client-auth-tls
    tls:
      internalSecretClass: kafka-internal-tls
      serverSecretClass: tls
    zookeeperConfigMapName: simple-kafka-znode
    brokerIdPodConfigMapName: broker-ids
  brokers:
    envOverrides:
      KAFKA_CLUSTER_ID: "MyCya7hbTD-Hay8PgCsCYA"
    roleGroups:
      default:
        replicas: 3
    configOverrides:
      broker.properties:
        inter.broker.protocol.version: "3.9" # - Latest value known to Kafka 3.9.1
        zookeeper.metadata.migration.enable: "true" # - Enable migration mode so the broker can participate in metadata migration.
        controller.listener.names: "CONTROLLER"
        controller.quorum.bootstrap.servers: "simple-kafka-controller-default-0.simple-kafka-controller-default-headless.kraft-migration.svc.cluster.local:9093,simple-kafka-controller-default-1.simple-kafka-controller-default-headless.kraft-migration.svc.cluster.local:9093,simple-kafka-controller-default-2.simple-kafka-controller-default-headless.kraft-migration.svc.cluster.local:9093"
  controllers:
    roleGroups:
      default:
        replicas: 3
    envOverrides:
      KAFKA_CLUSTER_ID: "MyCya7hbTD-Hay8PgCsCYA"
    configOverrides:
      controller.properties:
        zookeeper.metadata.migration.enable: "true" # Enable migration mode so the controller can read metadata from ZooKeeper.

The brokers are now restarted automatically due to configuration changes.

Finally we check that metadata migration was successful:

kubectl logs -n kraft-migration simple-kafka-controller-default-0 | grep -i 'completed migration' \
|| kubectl logs -n kraft-migration simple-kafka-controller-default-1 | grep -i 'completed migration' \
|| kubectl logs -n kraft-migration simple-kafka-controller-default-2 | grep -i 'completed migration'

...
[2025-12-22 09:23:53,372] INFO [KRaftMigrationDriver id=2110489705] Completed migration of metadata from ZooKeeper to KRaft. 0 records were generated in 102 ms across 0 batches. The average time spent waiting on a batch was -1.00 ms. The record types were {}. The current metadata offset is now 280 with an epoch of 3. Saw 0 brokers in the migrated metadata []. (org.apache.kafka.metadata.migration.KRaftMigrationDriver)

3. Migrate brokers

This is the last step before fully switching to KRaft mode. In case of unforeseen issues, it is the last step where we can roll back to ZooKeeper mode.

In this step we will perform the following actions:

  1. Remove the migration properties from the previous step on the brokers.

  2. Assign Kraft role properties to brokers.

  3. Apply the changes and restart the broker pods.

We need to preserve the quorum configuration added in the previous step.

For this step, the complete example KafkaCluster resource looks as follows:

---
apiVersion: kafka.stackable.tech/v1alpha1
kind: KafkaCluster
metadata:
  name: simple-kafka
  namespace: kraft-migration
spec:
  image:
    productVersion: 3.9.1
    pullPolicy: IfNotPresent
  clusterConfig:
    metadataManager: zookeeper
    authentication:
      - authenticationClass: kafka-client-auth-tls
    tls:
      internalSecretClass: kafka-internal-tls
      serverSecretClass: tls
    zookeeperConfigMapName: simple-kafka-znode
    brokerIdPodConfigMapName: broker-ids
  brokers:
    envOverrides:
      KAFKA_CLUSTER_ID: "MyCya7hbTD-Hay8PgCsCYA"
    roleGroups:
      default:
        replicas: 3
    configOverrides:
      broker.properties:
        controller.listener.names: "CONTROLLER"
        controller.quorum.bootstrap.servers: "simple-kafka-controller-default-0.simple-kafka-controller-default-headless.kraft-migration.svc.cluster.local:9093,simple-kafka-controller-default-1.simple-kafka-controller-default-headless.kraft-migration.svc.cluster.local:9093,simple-kafka-controller-default-2.simple-kafka-controller-default-headless.kraft-migration.svc.cluster.local:9093"
        process.roles: "broker"
        node.id: "${env:REPLICA_ID}"
  controllers:
    roleGroups:
      default:
        replicas: 3
    envOverrides:
      KAFKA_CLUSTER_ID: "MyCya7hbTD-Hay8PgCsCYA"
    configOverrides:
      controller.properties:
        zookeeper.metadata.migration.enable: "true" # Enable migration mode so the controller can read metadata from ZooKeeper.

4. Enable Kraft mode

After this step, the cluster will be fully running in KRaft mode and it cannot be rolled back to ZooKeeper mode anymore.

In this step we will perform the following actions:

  1. Put the cluster in Kraft mode by updating the spec.clusterConfig.metadataManager property.

  2. Remove Kraft quorum configuration from the broker pods.

  3. Remove the ZooKeeper migration flag from the controllers.

  4. Apply the changes and restart all pods.

We need to preserve the KAFKA_CLUSTER_ID environment variable for the rest of the lifetime of this cluster.

The complete example KafkaCluster resource after applying the required changes looks as follows:

---
apiVersion: kafka.stackable.tech/v1alpha1
kind: KafkaCluster
metadata:
  name: simple-kafka
  namespace: kraft-migration
spec:
  image:
    productVersion: 3.9.1
    pullPolicy: IfNotPresent
  clusterConfig:
    metadataManager: kraft
    authentication:
      - authenticationClass: kafka-client-auth-tls
    tls:
      internalSecretClass: kafka-internal-tls
      serverSecretClass: tls
    brokerIdPodConfigMapName: broker-ids
  brokers:
    envOverrides:
      KAFKA_CLUSTER_ID: "MyCya7hbTD-Hay8PgCsCYA"
    roleGroups:
      default:
        replicas: 3
    configOverrides:
      broker.properties:
        controller.listener.names: "CONTROLLER"
  controllers:
    roleGroups:
      default:
        replicas: 3
    envOverrides:
      KAFKA_CLUSTER_ID: "MyCya7hbTD-Hay8PgCsCYA"

Verify that the cluster is healthy and consumer/producer operations work as expected.

We can consume the previously produced messages by running the command below on one of the broker pods:

/stackable/kafka/bin/kafka-console-consumer.sh \
--consumer.config /stackable/config/client.properties \
--bootstrap-server simple-kafka-broker-default-0-listener-broker.kraft-migration.svc.cluster.local:9093 \
--topic kraft-migration-topic \
--offset earliest \
--partition 0 \
--timeout-ms 10000

5. Cleanup

Before proceeding with this step please ensure that the Kafka cluster is fully operational in KRaft mode.

In this step we remove the now unused ZooKeeper cluster and related resources.

If the ZooKeeper cluster is also serving other use cases than Kafka you can skip this step.

In our example we can remove the ZooKeeper cluster and the Znode resource as follows:

kubectl delete -n kraft-migration zookeeperznodes simple-kafka-znode
kubectl delete -n kraft-migration zookeeperclusters simple-zk

6. Next steps

After successfully migrating to Kraft mode, consider updating to Kafka version 4 to benefit from the latest features and improvements in KRaft mode.