Security
Authentication
Currently, the only supported authentication mechanism is Kerberos, which is disabled by default. For Kerberos to work a Kerberos KDC is needed, which the users need to provide. The secret-operator documentation states which kind of Kerberos servers are supported and how they can be configured.
1. Prepare Kerberos server
To configure HDFS to use Kerberos you first need to collect information about your Kerberos server, e.g. hostname and port. Additionally, you need a service-user which the secret-operator uses to create principals for the HDFS services.
2. Create Kerberos SecretClass
The next step is to enter all the necessary information into a SecretClass, as described in secret-operator documentation. The following guide assumes you have named your SecretClass kerberos.
3. Configure HDFS to use SecretClass
The next step is to configure your HdfsCluster to use the newly created SecretClass.
Follow the HDFS security guide to set up and test this.
Make sure to use the SecretClass named kerberos.
It is also necessary to configure 2 additional things in HDFS:
-
Define group mappings for users with
hadoop.user.group.static.mapping.overrides -
Tell HDFS that Hive is allowed to impersonate other users, i.e. Hive does not need any direct access permissions for itself, but should be able to impersonate Hive users when accessing HDFS. This can be done by e.g. setting
hadoop.proxyuser.hive.users=andhadoop.proxyuser.hive.hosts=to allow the userhiveto impersonate all other users.
An example of the above can be found in this integration test.
| This is only relevant if HDFS is used with the Hive metastore (many installations use the metastore with an S3 backend instead of HDFS). |
4. Configure Hive to use SecretClass
The last step is to configure the same SecretClass for Hive, which is done similarly to HDFS.
| HDFS and Hive need to use the same SecretClass (or at least use the same underlying Kerberos server). |
spec:
clusterConfig:
authentication:
kerberos:
secretClass: kerberos # Put your SecretClass name in here
The kerberos.secretClass is used to give Hive the possibility to request keytabs from the secret-operator.
5. Access Hive
In case you want to access Hive it is recommended to start up a client Pod that connects to Hive, rather than shelling into the master. We have an integration test for this exact purpose, where you can see how to connect and get a valid keytab.
Authorization
The Stackable Operator for Apache Hive supports the following authorization methods.
Open Policy Agent (OPA)
The Apache Hive metastore can be configured to delegate authorization decisions to an Open Policy Agent (OPA) instance. More information on the setup and configuration of OPA can be found in the OPA Operator documentation. A Hive cluster can be configured using OPA authorization by adding this section to the configuration:
spec:
clusterConfig:
authorization:
opa:
configMapName: opa (1)
package: hms (2)
| 1 | The name of your OPA Stacklet (opa in this case) |
| 2 | The rego rule package to use for policy decisions. This is optional and defaults to the name of the Hive Stacklet. |
Defining rego rules
For a general explanation of how rules are written, please refer to the OPA documentation. Authorization with OPA is done using the hive-metastore-opa-authorizer plugin.
OPA Inputs
The payload sent by Hive with each request to OPA, that is accessible within the rego rules, has the following structure:
{
"identity": {
"username": "<user>",
"groups": ["<group1>", "<group2>"]
},
"resources": {
"database": null,
"table": null,
"partition": null,
"columns": ["col1", "col2"]
},
"privileges": {
"readRequiredPriv": [],
"writeRequiredPriv": [],
"inputs": null,
"outputs": null
}
}
-
identity: Contains user information.-
username: The name of the user. -
groups: A list of groups the user belongs to.
-
-
resources: Specifies the resources involved in the request.-
database: The database object. -
table: The table object. -
partition: The partition object. -
columns: A list of column names involved in the request.
-
-
privileges: Details the privileges required for the request.-
readRequiredPriv: A list of required read privileges. -
writeRequiredPriv: A list of required write privileges. -
inputs: Input tables for the request. -
outputs: Output tables for the request.
-
Example OPA Rego Rule
Below is a basic rego rule that demonstrates how to handle input dictionary sent from the hive authorizer to OPA:
package hms
default database_allow = false
default table_allow = false
default column_allow = false
default partition_allow = false
default user_allow = false
database_allow if {
input.identity.username == "stackable"
input.resources.database.name == "test_db"
}
table_allow if {
input.identity.username == "stackable"
input.resources.table.dbName == "test_db"
input.resources.table.tableName == "test_table"
input.privileges.readRequiredPriv[0].priv == "SELECT"
}
table_allow if {
input.identity.username == "stackable"
input.resources.table.dbName == "test_db"
input.privileges.writeRequiredPriv[0].priv == "CREATE"
}
-
database_allowgrants access if the user isstackableand the database istest_db. -
table_allowgrants access if the user isstackable, the database istest_dband:-
the table is
test_tableand the required read privilege isSELECT. -
the required write privilege is
CREATEwithout any table restriction.
-
Configuring policy URLs
The database_allow, table_allow, column_allow, partition_allow, and user_allow policy URLs can be config overridden using the properties in hive-site.xml:
-
com.bosch.bdps.opa.authorization.policy.url.database -
com.bosch.bdps.opa.authorization.policy.url.table -
com.bosch.bdps.opa.authorization.policy.url.column -
com.bosch.bdps.opa.authorization.policy.url.partition -
com.bosch.bdps.opa.authorization.policy.url.user