Security

Authentication

Currently the only supported authentication mechanism is Kerberos, which is disabled by default. For Kerberos to work a Kerberos KDC is needed, which the users needs to provide. The secret-operator documentation states which kind of Kerberos servers are supported and how they can be configured.

Kerberos is supported starting from HDFS version 3.3.x

1. Prepare Kerberos server

To configure HDFS to use Kerberos you first need to collect information about your Kerberos server, e.g. hostname and port. Additionally you need a service-user, which the secret-operator uses to create create principals for the HDFS services.

2. Create Kerberos SecretClass

Afterwards you need to enter all the needed information into a SecretClass, as described in secret-operator documentation. The following guide assumes you have named your SecretClass kerberos-hdfs.

3. Configure HDFS to use SecretClass

The last step is to configure your HdfsCluster to use the newly created SecretClass.

spec:
  clusterConfig:
    authentication:
      tlsSecretClass: tls # Optional, defaults to "tls"
      kerberos:
        secretClass: kerberos-hdfs # Put your SecretClass name in here

The kerberos.secretClass is used to give HDFS the possibility to request keytabs from the secret-operator.

The tlsSecretClass is needed to request TLS certificates, used e.g. for the Web UIs.

4. Verify that Kerberos authentication is required

Use stackablectl stacklet list to get the endpoints where the HDFS namenodes are reachable. Open the link (note that the namenode is now using https). You should see a Web UI similar to the following:

hdfs webui kerberos

The important part is

Security is on.

You can also shell into the namenode and try to access the file system: kubectl exec -it hdfs-namenode-default-0 -c namenode — bash -c 'kdestroy && bin/hdfs dfs -ls /'

You should get the error message org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS].

5. Access HDFS

In case you want to access your HDFS it is recommended to start up a client Pod that connects to HDFS, rather than shelling into the namenode. We have an integration test for this exact purpose, where you can see how to connect and get a valid keytab.

Authorization

For authorization we developed hdfs-utils, which contains an OPA authorizer and group mapper. This matches our general OPA authorization mechanisms.

It is recommended to enable Kerberos when doing Authorization, as otherwise you don’t have any security measures at all. There still might be cases where you want authorization on top of a cluster without authentication, as you don’t want to accidentally drop files and therefore use different users for different use-cases.

In order to use the authorizer you need a ConfigMap containing a rego rule and reference that in your HDFS cluster. In addition to this you need a OpaCluster that serves the rego rules - this guide assumes it’s called opa.

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: hdfs-regorules
  labels:
    opa.stackable.tech/bundle: "true"
data:
  hdfs.rego: |
    package hdfs

    import rego.v1

    default allow = true

This rego rule is intended for demonstration purposes and allows every operation. For a production setup you probably want to take a look at our integration tests for a more secure set of rego rules. Reference the rego rule as follows in your HdfsCluster:

spec:
  clusterConfig:
    authorization:
      opa:
        configMapName: opa
        package: hdfs

How it works

Take all your knowledge about HDFS authorization and throw it in the bin. The approach we are taking for our authorization paradigm departs significantly from traditional Hadoop patterns and POSIX-style permissions.

In short, the current rego rules ignore the file ownership, permissions, ACLs and all other attributes files can have. All of this is state in HDFS and clashes with the infrastructure-as-code approach (IaC).

Instead, HDFS will send a request detailing who (e.g. alice/test-hdfs-permissions.default.svc.cluster.local@CLUSTER.LOCAL) is trying to do what (e.g. open, create, delete or append) on what file (e.g. /foo/bar). OPA then makes a decision if this action is allowed or not.

Instead of chown-ing a directory to a different user to assign write permissions, you should go to your IaC Git repository and add a rego rule entry specifying that the user is allowed to read and write to that directory.

Group memberships

We encountered several challenges while implementing the group mapper, the most serious of which being that the GroupMappingServiceProvider interface only passes the shortUsername when asking for group memberships. This does not allow us to differentiate between e.g. hbase/hbase-prod.prod-namespace.svc.cluster.local@CLUSTER.LOCAL and hbase/hbase-dev.dev-namespace.svc.cluster.local@CLUSTER.LOCAL, as the GroupMapper will get asked for hbase group memberships in both cases.

Users could work around this to assign unique shortNames by using hadoop.security.auth_to_local. This is however a potentially complex and error-prone process. We also tried mapping the principals from Kerberos 1:1 to the HDFS shortUserName, so the GroupMapper has access to the full userName. However, this did not work, as HDFS only allows "simple usernames", which are not allowed to contain a / or @.

Because of these issues we do not use a custom GroupMapper and only rely on the authorizer, which in turn receives a complete UserGroupInformation object, including the shortUserName and the precious full userName. This has the downside that the group memberships used in OPA for authorization are not known to HDFS. The implication is thus that you cannot add users to the superuser group, which is needed for certain administrative actions in HDFS. We have decided that this is an acceptable approach as normal operations will not be affected. In case you really need users to be part of the superusers group, you can use a configOverride on hadoop.user.group.static.mapping.overrides for that.

Wire encryption

In case Kerberos is enabled, Privacy mode is used for best security. Wire encryption without Kerberos as well as other wire encryption modes are not supported.