Usage

Authentication

Every user has to authenticate themselves before using Superset. There are multiple options to set up the authentication of users.

Webinterface

The default setting is to manually set up users via the Webinterface.

LDAP

Superset supports authentication of users against an LDAP server. This requires setting up an AuthenticationClass for the LDAP server. The AuthenticationClass is then referenced in the SupersetCluster resource as follows:

apiVersion: superset.stackable.tech/v1alpha1
kind: SupersetCluster
metadata:
  name: superset-with-ldap-server
spec:
  image:
    productVersion: 1.5.1
    stackableVersion: 3.0.0
  [...]
  authenticationConfig:
    authenticationClass: ldap    (1)
    userRegistrationRole: Admin  (2)

1	The reference to an AuthenticationClass called `ldap`
2	The default role that all users are assigned to

Users that log in with LDAP are assigned to a default Role which is specified with the userRegistrationRole property.

You can follow the Authentication with OpenLDAP tutorial to learn how to set up an AuthenticationClass for an LDAP server, as well as consulting the AuthenticationClass reference.

Authorization

Superset has a concept called Roles which allows you to grant user permissions based on roles. Have a look at the Superset documentation on Security.

Webinterface

You can see all the available roles in the Webinterface of Superset. You can view all the available roles in the Webinterface of Superset and can also assign users to these roles.

LDAP

Superset supports assigning Roles to users based on their LDAP group membership, though this is not yet supported by the Stackable operator. All the users logging in via LDAP get assigned to the same role which you can configure via the attribute authenticationConfig.userRegistrationRole on the SupersetCluster object:

apiVersion: superset.stackable.tech/v1alpha1
kind: SupersetCluster
metadata:
  name: superset-with-ldap-server
spec:
  [...]
  authenticationConfig:
    authenticationClass: ldap
    userRegistrationRole: Admin  (1)

1	All users are assigned to the `Admin` role

Connecting Apache Druid Clusters

The operator can automatically connect Superset to Apache Druid clusters managed by the Stackable Druid Cluster.

To do so, create a DruidConnection resource:

apiVersion: superset.stackable.tech/v1alpha1
kind: DruidConnection
metadata:
  name: superset-druid-connection
spec:
  superset:
    name: superset
    namespace: default
  druid:
    name: my-druid-cluster
    namespace: default

The name and namespace in spec.superset refer to the Superset cluster that you want to connect. Following our example above, the name is superset.

In spec.druid you specify the name and namespace of your Druid cluster.

The namespace part is optional; if it is omitted it will default to the namespace of the DruidConnection.

The namespace for the Superset and Druid cluster can be omitted, in that case the Operator will assume that they are in the same namespace as the DruidConnection.

Once the database is initialized, the connection will be added to the cluster by the operator. You can see it in the user interface under Data > Databases:

Superset databases showing the connected Druid cluster

Monitoring

The managed Superset instances are automatically configured to export Prometheus metrics. See Monitoring for more details.

Configuration & Environment Overrides

The cluster definition also supports overriding configuration properties and environment variables, either per role or per role group, where the more specific override (role group) has precedence over the less specific one (role).

Overriding certain properties which are set by the operator (such as the STATS_LOGGER) can interfere with the operator and can lead to problems.

Configuration Properties

For a role or role group, at the same level of config, you can specify configOverrides for the superset_config.py. For example, if you want to set the CSV export encoding and the preferred databases adapt the nodes section of the cluster resource as follows:

nodes:
  roleGroups:
    default:
      config: {}
      configOverrides:
        superset_config.py:
          CSV_EXPORT: "{'encoding': 'utf-8'}"
          PREFERRED_DATABASES: |-
            [
                'PostgreSQL',
                'Presto',
                'MySQL',
                'SQLite',
                # etc.
            ]

Just as for the config, it is possible to specify this at the role level as well:

nodes:
  configOverrides:
    superset_config.py:
      CSV_EXPORT: "{'encoding': 'utf-8'}"
      PREFERRED_DATABASES: |-
        [
            'PostgreSQL',
            'Presto',
            'MySQL',
            'SQLite',
            # etc.
        ]
  roleGroups:
    default:
      config: {}

All override property values must be strings. They are treated as Python expressions. So care must be taken to not produce an invalid configuration.

For a full list of configuration options we refer to the main config file for Superset.

Environment Variables

In a similar fashion, environment variables can be (over)written. For example per role group:

nodes:
  roleGroups:
    default:
      config: {}
      envOverrides:
        FLASK_ENV: development

or per role:

nodes:
  envOverrides:
    FLASK_ENV: development
  roleGroups:
    default:
      config: {}

Storage for data volumes

The Superset operator currently does not support using PersistentVolumeClaims for internal storage.

Resource Requests

Stackable operators handle resource requests in a sligtly different manner than Kubernetes. Resource requests are defined on role or group level. See Roles and role groups for details on these concepts. On a role level this means that e.g. all workers will use the same resource requests and limits. This can be further specified on role group level (which takes priority to the role level) to apply different resources.

This is an example on how to specify CPU and memory resources using the Stackable Custom Resources:

---
apiVersion: example.stackable.tech/v1alpha1
kind: ExampleCluster
metadata:
  name: example
spec:
  workers: # role-level
    config:
      resources:
        cpu:
          min: 300m
          max: 600m
        memory:
          limit: 3Gi
    roleGroups: # role-group-level
      resources-from-role: # role-group 1
        replicas: 1
      resources-from-role-group: # role-group 2
        replicas: 1
        config:
          resources:
            cpu:
              min: 400m
              max: 800m
            memory:
              limit: 4Gi

In this case, the role group resources-from-role will inherit the resources specified on the role level. Resulting in a maximum of 3Gi memory and 600m CPU resources.

The role group resources-from-role-group has maximum of 4Gi memory and 800m CPU resources (which overrides the role CPU resources).

For Java products the actual used Heap memory is lower than the specified memory limit due to other processes in the Container requiring memory to run as well. Currently, 80% of the specified memory limits is passed to the JVM.

For memory only a limit can be specified, which will be set as memory request and limit in the Container. This is to always guarantee a Container the full amount memory during Kubernetes scheduling.

If no resource requests are configured explicitly, the Superset operator uses the following defaults:

nodes:
  roleGroups:
    default:
      config:
        resources:
          cpu:
            min: '200m'
            max: "4"
          memory:
            limit: '2Gi'

The default values are most likely not sufficient to run a proper cluster in production. Please adapt according to your requirements.