Our images of Apache Hadoop do contain the necessary binaries and libraries to use the HDFS FUSE driver.
FUSE is short for Filesystem in Userspace and allows a user to export a filesystem into the Linux kernel, which can then be mounted. HDFS contains a native FUSE driver/application, which means that an existing HDFS filesystem can be mounted into a Linux environment.
To use the FUSE driver you can either copy the required files out of the image and run it on a host outside of Kubernetes or you can run it in a Pod. This pod, however, will need some extra capabilities.
This is an example pod that will work as long as the host system that is running the kubelet does support FUSE:
apiVersion: v1 kind: Pod metadata: name: hdfs-fuse spec: containers: - name: hdfs-fuse env: - name: HADOOP_CONF_DIR value: /stackable/conf/hdfs image: docker.stackable.tech/stackable/hadoop:<version> (1) imagePullPolicy: Always securityContext: privileged: true command: - tail - -f - /dev/null volumeMounts: - mountPath: /stackable/conf/hdfs name: hdfs-config volumes: - name: hdfs-config configMap: name: <your hdfs here> (2)
|1||Ideally use the same version your HDFS is using. FUSE is baked in to our images as of SDP 23.11.|
|2||This needs to be a reference to a discovery ConfigMap as written by our HDFS operator.|
Unfortunately, there is no way around some extra privileges. In Kubernetes the Pods usually share the Kernel with the host running the Kubelet, which means a Pod wanting to use FUSE will need access to the underlying Kernel modules.
Inside this Pod you can get a shell (e.g. using
kubectl exec --stdin --tty hdfs-fuse — /bin/bash) to get access to a script called
fuse_dfs_wrapper (it is in the
PATH of our Hadoop images).
To see the available options, call the script without any parameters.
To mount HDFS call the script like this:
fuse_dfs_wrapper dfs://<your hdfs> <target> (1) (2) # This will run in debug mode and stay in the foreground fuse_dfs_wrapper -odebug dfs://<your hdfs> <target> # Example: mkdir simple-hdfs fuse_dfs_wrapper dfs://simple-hdfs simple-hdfs cd simple-hdfs # Any operations in this directory will now happen in HDFS
|1||Again, use the name of the HDFS service as above|