Security
Authentication
Currently, the only supported authentication mechanism is Kerberos, which is disabled by default. For Kerberos to work a Kerberos KDC is needed, which the users need to provide. The secret-operator documentation states which kind of Kerberos servers are supported and how they can be configured.
1. Prepare Kerberos server
To configure HDFS to use Kerberos you first need to collect information about your Kerberos server, e.g. hostname and port. Additionally, you need a service-user which the secret-operator uses to create principals for the HDFS services.
2. Create Kerberos SecretClass
The next step is to enter all the necessary information into a SecretClass, as described in secret-operator documentation. The following guide assumes you have named your SecretClass kerberos
.
3. Configure HDFS to use SecretClass
The next step is to configure your HdfsCluster to use the newly created SecretClass.
Follow the HDFS security guide to set up and test this.
Make sure to use the SecretClass named kerberos
.
It is also necessary to configure 2 additional things in HDFS:
-
Define group mappings for users with
hadoop.user.group.static.mapping.overrides
-
Tell HDFS that Hive is allowed to impersonate other users, i.e. Hive does not need any direct access permissions for itself, but should be able to impersonate Hive users when accessing HDFS. This can be done by e.g. setting
hadoop.proxyuser.hive.users=
andhadoop.proxyuser.hive.hosts=
to allow the userhive
to impersonate all other users.
An example of the above can be found in this integration test.
This is only relevant if HDFS is used with the Hive metastore (many installations use the metastore with an S3 backend instead of HDFS). |
4. Configure Hive to use SecretClass
The last step is to configure the same SecretClass for Hive, which is done similarly to HDFS.
HDFS and Hive need to use the same SecretClass (or at least use the same underlying Kerberos server). |
spec:
clusterConfig:
authentication:
kerberos:
secretClass: kerberos # Put your SecretClass name in here
The kerberos.secretClass
is used to give Hive the possibility to request keytabs from the secret-operator.
5. Access Hive
In case you want to access Hive it is recommended to start up a client Pod that connects to Hive, rather than shelling into the master. We have an integration test for this exact purpose, where you can see how to connect and get a valid keytab.