Stackable Operator for Apache HDFS
The Stackable Operator for Apache HDFS is used to set up HFDS in high-availability mode. It depends on the Stackable Operator for Apache ZooKeeper to operate a ZooKeeper cluster to coordinate the active and standby NameNodes.
|This operator only works with images from the Stackable repository|
Three roles of the HDFS cluster are implemented:
DataNode - responsible for storing the actual data.
JournalNode - responsible for keeping track of HDFS blocks and used to perform failovers in case the active NameNode fails. For details see: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html
NameNode - responsible for keeping track of HDFS blocks and providing access to the data.
The operator creates the following K8S objects per role group defined in the custom resource.
Service - ClusterIP used for intra-cluster communication.
ConfigMap - HDFS configuration files like
log4j.propertiesare defined here and mounted in the pods.
StatefulSet - where the replica count, volume mounts and more for each role group is defined.
In addition, a
NodePort service is created for each pod labeled with
hdfs.stackable.tech/pod-service=true that exposes all container ports to the outside world (from the perspective of K8S).
In the custom resource you can specify the number of replicas per role group (NameNode, DataNode or JournalNode). A minimal working configuration requires:
2 NameNodes (HA)
1 DataNode (should match at least the
The Stackable Operator for Apache HDFS currently supports the following versions of HDFS: