Security

Authentication

Currently, the only supported authentication mechanism is Kerberos, which is disabled by default. For Kerberos to work a Kerberos KDC is needed, which the users need to provide. The secret-operator documentation states which kind of Kerberos servers are supported and how they can be configured.

1. Prepare Kerberos server

To configure HDFS to use Kerberos you first need to collect information about your Kerberos server, e.g. hostname and port. Additionally, you need a service-user which the secret-operator uses to create principals for the HDFS services.

2. Create Kerberos SecretClass

The next step is to enter all the necessary information into a SecretClass, as described in secret-operator documentation. The following guide assumes you have named your SecretClass kerberos.

3. Configure HDFS to use SecretClass

The next step is to configure your HdfsCluster to use the newly created SecretClass. Follow the HDFS security guide to set up and test this. Make sure to use the SecretClass named kerberos. It is also necessary to configure 2 additional things in HDFS:

  • Define group mappings for users with hadoop.user.group.static.mapping.overrides

  • Tell HDFS that Hive is allowed to impersonate other users, i.e. Hive does not need any direct access permissions for itself, but should be able to impersonate Hive users when accessing HDFS. This can be done by e.g. setting hadoop.proxyuser.hive.users= and hadoop.proxyuser.hive.hosts= to allow the user hive to impersonate all other users.

An example of the above can be found in this integration test.

This is only relevant if HDFS is used with the Hive metastore (many installations use the metastore with an S3 backend instead of HDFS).

4. Configure Hive to use SecretClass

The last step is to configure the same SecretClass for Hive, which is done similarly to HDFS.

HDFS and Hive need to use the same SecretClass (or at least use the same underlying Kerberos server).
spec:
  clusterConfig:
    authentication:
      kerberos:
        secretClass: kerberos # Put your SecretClass name in here

The kerberos.secretClass is used to give Hive the possibility to request keytabs from the secret-operator.

5. Access Hive

In case you want to access Hive it is recommended to start up a client Pod that connects to Hive, rather than shelling into the master. We have an integration test for this exact purpose, where you can see how to connect and get a valid keytab.

Authorization

The Stackable Operator for Apache Hive supports the following authorization methods.

Open Policy Agent (OPA)

The Apache Hive metastore can be configured to delegate authorization decisions to an Open Policy Agent (OPA) instance. More information on the setup and configuration of OPA can be found in the OPA Operator documentation. A Hive cluster can be configured using OPA authorization by adding this section to the configuration:

spec:
  clusterConfig:
    authorization:
      opa:
        configMapName: opa (1)
        package: hms (2)
1 The name of your OPA Stacklet (opa in this case)
2 The rego rule package to use for policy decisions. This is optional and defaults to the name of the Hive Stacklet.

Defining rego rules

For a general explanation of how rules are written, please refer to the OPA documentation. Authorization with OPA is done using the hive-metastore-opa-authorizer plugin.

OPA Inputs

The payload sent by Hive with each request to OPA, that is accessible within the rego rules, has the following structure:

{
  "identity": {
    "username": "<user>",
    "groups": ["<group1>", "<group2>"]
  },
  "resources": {
    "database": null,
    "table": null,
    "partition": null,
    "columns": ["col1", "col2"]
  },
  "privileges": {
    "readRequiredPriv": [],
    "writeRequiredPriv": [],
    "inputs": null,
    "outputs": null
  }
}
  • identity: Contains user information.

    • username: The name of the user.

    • groups: A list of groups the user belongs to.

  • resources: Specifies the resources involved in the request.

    • database: The database object.

    • table: The table object.

    • partition: The partition object.

    • columns: A list of column names involved in the request.

  • privileges: Details the privileges required for the request.

    • readRequiredPriv: A list of required read privileges.

    • writeRequiredPriv: A list of required write privileges.

    • inputs: Input tables for the request.

    • outputs: Output tables for the request.

Example OPA Rego Rule

Below is a basic rego rule that demonstrates how to handle input dictionary sent from the hive authorizer to OPA:

package hms

default database_allow = false
default table_allow = false
default column_allow = false
default partition_allow = false
default user_allow = false

database_allow if {
  input.identity.username == "stackable"
  input.resources.database.name == "test_db"
}

table_allow if {
  input.identity.username == "stackable"
  input.resources.table.dbName == "test_db"
  input.resources.table.tableName == "test_table"
  input.privileges.readRequiredPriv[0].priv == "SELECT"
}

table_allow if {
  input.identity.username == "stackable"
  input.resources.table.dbName == "test_db"
  input.privileges.writeRequiredPriv[0].priv == "CREATE"
}
  • database_allow grants access if the user is stackable and the database is test_db.

  • table_allow grants access if the user is stackable, the database is test_db and:

    • the table is test_table and the required read privilege is SELECT.

    • the required write privilege is CREATE without any table restriction.

Configuring policy URLs

The database_allow, table_allow, column_allow, partition_allow, and user_allow policy URLs can be config overridden using the properties in hive-site.xml:

  • com.bosch.bdps.opa.authorization.policy.url.database

  • com.bosch.bdps.opa.authorization.policy.url.table

  • com.bosch.bdps.opa.authorization.policy.url.column

  • com.bosch.bdps.opa.authorization.policy.url.partition

  • com.bosch.bdps.opa.authorization.policy.url.user

TLS secured OPA cluster

Stackable OPA clusters secured via TLS are supported and no further configuration is required. The Stackable Hive operator automatically adds the certificate from the SecretClass used to secure the OPA cluster to its trust.