Usage

Requirements

A distributed Apache HBase installation depends on a running Apache ZooKeeper and HDFS cluster. See the documentation for the Stackable Operator for Apache HDFS how to set up these clusters.

Deployment of an Apache HBase cluster

An Apache HBase cluster can be created with the following cluster specification:

apiVersion: hbase.stackable.tech/v1alpha1
kind: HbaseCluster
metadata:
  name: simple-hbase
spec:
  image:
    productVersion: 2.4.12
    stackableVersion: 0.4.0
  hdfsConfigMapName: simple-hdfs-namenode-default
  zookeeperConfigMapName: simple-hbase-znode
  config:
    hbaseOpts:
    hbaseRootdir: /hbase
  masters:
    roleGroups:
      default:
        replicas: 1
  regionServers:
    roleGroups:
      default:
        replicas: 1
  restServers:
    roleGroups:
      default:
        replicas: 1
---
apiVersion: zookeeper.stackable.tech/v1alpha1
kind: ZookeeperZnode
metadata:
  name: simple-hbase-znode
spec:
  clusterRef:
    name: simple-zk
  • hdfsConfigMapName references the config map created by the Stackable HDFS operator.

  • zookeeperConfigMapName references the config map created by the Stackable ZooKeeper operator.

  • hbaseOpts is mapped to the environment variable HBASE_OPTS in hbase-env.sh.

  • hbaseRootdir is mapped to hbase.rootdir in hbase-site.xml.

Please note that the version you need to specify is not only the version of HBase which you want to roll out, but has to be amended with a Stackable version as shown. This Stackable version is the version of the underlying container image which is used to execute the processes. For a list of available versions please check our image registry. It should generally be safe to simply use the latest image version that is available.

Monitoring

The managed HBase instances are automatically configured to export Prometheus metrics. See Monitoring for more details.

Log aggregation

The logs can be forwarded to a Vector log aggregator by providing a discovery ConfigMap for the aggregator and by enabling the log agent:

spec:
  vectorAggregatorConfigMapName: vector-aggregator-discovery
  masters:
    config:
      logging:
        enableVectorAgent: true
  regionServers:
    config:
      logging:
        enableVectorAgent: true
  restServers:
    config:
      logging:
        enableVectorAgent: true

Further information on how to configure logging, can be found in Logging.

Configuration Overrides

The cluster definition also supports overriding configuration properties and environment variables, either per role or per role group, where the more specific override (role group) has precedence over the less specific one (role).

Overriding certain properties which are set by operator can interfere with the operator and can lead to problems.

Configuration Properties

For a role or role group, at the same level of config, you can specify: configOverrides for the following files:

  • hbase-site.xml

  • hbase-env.sh

For example, if you want to set the hbase.rest.threads.min to 4 and the HBASE_HEAPSIZE to two GB adapt the restServers section of the cluster resource like so:

restServers:
  roleGroups:
    default:
      config: {}
      configOverrides:
        hbase-site.xml:
          hbase.rest.threads.min: "4"
        hbase-env.sh:
          HBASE_HEAPSIZE: "2G"
      replicas: 1

Just as for the config, it is possible to specify this at role level as well:

restServers:
  configOverrides:
    hbase-site.xml:
      hbase.rest.threads.min: "4"
    hbase-env.sh:
      HBASE_HEAPSIZE: "2G"
  roleGroups:
    default:
      config: {}
      replicas: 1

All override property values must be strings. The properties will be formatted and escaped correctly into the XML file, respectively inserted as is into the env.sh file.

For a full list of configuration options we refer to the HBase Configuration Documentation.

Storage for data volumes

The HBase Operator currently does not support any PersistentVolumeClaims.

Resource Requests

Stackable operators handle resource requests in a sligtly different manner than Kubernetes. Resource requests are defined on role or group level. See Roles and role groups for details on these concepts. On a role level this means that e.g. all workers will use the same resource requests and limits. This can be further specified on role group level (which takes priority to the role level) to apply different resources.

This is an example on how to specify CPU and memory resources using the Stackable Custom Resources:

---
apiVersion: example.stackable.tech/v1alpha1
kind: ExampleCluster
metadata:
  name: example
spec:
  workers: # role-level
    config:
      resources:
        cpu:
          min: 300m
          max: 600m
        memory:
          limit: 3Gi
    roleGroups: # role-group-level
      resources-from-role: # role-group 1
        replicas: 1
      resources-from-role-group: # role-group 2
        replicas: 1
        config:
          resources:
            cpu:
              min: 400m
              max: 800m
            memory:
              limit: 4Gi

In this case, the role group resources-from-role will inherit the resources specified on the role level. Resulting in a maximum of 3Gi memory and 600m CPU resources.

The role group resources-from-role-group has maximum of 4Gi memory and 800m CPU resources (which overrides the role CPU resources).

For Java products the actual used Heap memory is lower than the specified memory limit due to other processes in the Container requiring memory to run as well. Currently, 80% of the specified memory limits is passed to the JVM.

For memory only a limit can be specified, which will be set as memory request and limit in the Container. This is to always guarantee a Container the full amount memory during Kubernetes scheduling.

If no resources are configured explicitly, the HBase operator uses following defaults:

regionServers:
  roleGroups:
    default:
      config:
        resources:
          cpu:
            min: '200m'
            max: "4"
          memory:
            limit: '2Gi'
The default values are most likely not sufficient to run a proper cluster in production. Please adapt according to your requirements.

For more details regarding Kubernetes CPU limits see: Assign CPU Resources to Containers and Pods.

Phoenix

The Apache Phoenix project provides the ability to interact with HBase with JBDC using familiar SQL-syntax. The Phoenix dependencies are bundled with the Stackable HBase image and do not need to be installed separately (client components will need to ensure that they have the correct client-side libraries available). Information about client-side installation can be found here.

Phoenix comes bundled with a few simple scripts to verify a correct server-side installation. For example, assuming that phoenix dependencies have been installed to their default location of /stackable/phoenix/bin, we can issue the following using the supplied psql.py script:

/stackable/phoenix/bin/psql.py  && \
   /stackable/phoenix/examples/WEB_STAT.sql && \
   /stackable/phoenix/examples/WEB_STAT.csv  && \
   /stackable/phoenix/examples/WEB_STAT_QUERIES.sql

This script creates a java command that creates, populates and queries a Phoenix table called WEB_STAT. Alternatively, one can use the sqlline.py script (which wraps the sqlline utility):

/stackable/phoenix/bin/sqlline.py [zookeeper] [sql file]

The script opens an SQL prompt from where one can list, query, create and generally interact with Phoenix tables. So, to query the table that was created in the previous step, start the script and enter some SQL at the prompt:

Phoenix Sqlline

The Phoenix table WEB_STAT is created as an HBase table, and can be viewed normally from within the HBase UI:

Phoenix Tables

The SYSTEM* tables are those required by Phoenix and are created the first time that Phoenix is invoked.

Both psql.py and sqlline.py generate a java command that calls classes from the Phoenix client library .jar. The Zookeeper quorum does not need to be supplied as part of the URL used by the JDBC connection string, as long as the environment variable HBASE_CONF_DIR is set and supplied as an element for the -cp classpath search: the cluster information is then extracted from $HBASE_CONF_DIR/hbase-site.xml.