Security

Authentication

Currently, the only supported authentication mechanism is Kerberos, which is disabled by default. For Kerberos to work a Kerberos KDC is needed, which the users need to provide. The secret-operator documentation states which kind of Kerberos servers are supported and how they can be configured.

1. Prepare Kerberos server

To configure HDFS to use Kerberos you first need to collect information about your Kerberos server, e.g. hostname and port. Additionally, you need a service-user which the secret-operator uses to create principals for the HDFS services.

2. Create Kerberos SecretClass

The next step is to enter all the necessary information into a SecretClass, as described in secret-operator documentation. The following guide assumes you have named your SecretClass kerberos.

3. Configure HDFS to use SecretClass

The next step is to configure your HdfsCluster to use the newly created SecretClass. Follow the HDFS security guide to set up and test this. Make sure to use the SecretClass named kerberos. It is also necessary to configure 2 additional things in HDFS:

Define group mappings for users with hadoop.user.group.static.mapping.overrides
Tell HDFS that Hive is allowed to impersonate other users, i.e. Hive does not need any direct access permissions for itself, but should be able to impersonate Hive users when accessing HDFS. This can be done by e.g. setting hadoop.proxyuser.hive.users= and hadoop.proxyuser.hive.hosts= to allow the user hive to impersonate all other users.

An example of the above can be found in this integration test.

This is only relevant if HDFS is used with the Hive metastore (many installations use the metastore with an S3 backend instead of HDFS).

4. Configure Hive to use SecretClass

The last step is to configure the same SecretClass for Hive, which is done similarly to HDFS.

HDFS and Hive need to use the same SecretClass (or at least use the same underlying Kerberos server).

spec:
  clusterConfig:
    authentication:
      kerberos:
        secretClass: kerberos # Put your SecretClass name in here

The kerberos.secretClass is used to give Hive the possibility to request keytabs from the secret-operator.

5. Access Hive

In case you want to access Hive it is recommended to start up a client Pod that connects to Hive, rather than shelling into the master. We have an integration test for this exact purpose, where you can see how to connect and get a valid keytab.