First steps

Once you have followed the steps in the Installation section to install the operator and its dependencies, you deploy an HBase cluster and its dependencies. Afterward you can verify that it works by creating tables and data in HBase using the REST API and Apache Phoenix (an SQL layer used to interact with HBase).

Setup

ZooKeeper

To deploy a ZooKeeper cluster create one file called zk.yaml:

---
apiVersion: zookeeper.stackable.tech/v1alpha1
kind: ZookeeperCluster
metadata:
  name: simple-zk
spec:
  image:
    productVersion: 3.9.4
  servers:
    roleGroups:
      default:
        replicas: 1

We also need to define a ZNode that is used by the HDFS and HBase clusters to reference ZooKeeper. Create another file called znode.yaml and define a separate ZNode for each service:

---
apiVersion: zookeeper.stackable.tech/v1alpha1
kind: ZookeeperZnode
metadata:
  name: simple-hdfs-znode
spec:
  clusterRef:
    name: simple-zk
---
apiVersion: zookeeper.stackable.tech/v1alpha1
kind: ZookeeperZnode
metadata:
  name: simple-hbase-znode
spec:
  clusterRef:
    name: simple-zk

Apply both of these files:

kubectl apply -f zk.yaml
kubectl apply -f znode.yaml

The state of the ZooKeeper cluster can be tracked with kubectl:

kubectl rollout status --watch statefulset/simple-zk-server-default --timeout=300s

HDFS

An HDFS cluster has three components: the namenode, the datanode and the journalnode. Create a file named hdfs.yaml defining 2 namenodes and one datanode and journalnode each:

---
apiVersion: hdfs.stackable.tech/v1alpha1
kind: HdfsCluster
metadata:
  name: simple-hdfs
spec:
  image:
    productVersion: 3.4.2
  clusterConfig:
    dfsReplication: 1
    zookeeperConfigMapName: simple-hdfs-znode
  nameNodes:
    roleGroups:
      default:
        replicas: 2
  dataNodes:
    roleGroups:
      default:
        replicas: 1
  journalNodes:
    roleGroups:
      default:
        replicas: 1

Where:

metadata.name contains the name of the HDFS cluster
the HBase version in the Docker image provided by Stackable must be set in spec.image.productVersion

Please note that the version you need to specify for spec.image.productVersion is the desired version of Apache HBase. You can optionally specify the spec.image.stackableVersion to a certain release like 24.7.0 but it is recommended to leave it out and use the default provided by the operator. Available official images are stored in the Stackable image registry. Information on how to browse the registry can be found here. It should generally be safe to simply use the latest image version that is available.

Create the actual HDFS cluster by applying the file:

kubectl apply -f hdfs.yaml

Track the progress with kubectl as this step may take a few minutes:

kubectl rollout status --watch statefulset/simple-hdfs-datanode-default --timeout=300s
kubectl rollout status --watch statefulset/simple-hdfs-namenode-default --timeout=300s
kubectl rollout status --watch statefulset/simple-hdfs-journalnode-default --timeout=300s

HBase

You can now create the HBase cluster. Create a file called hbase.yaml containing the following:

---
apiVersion: hbase.stackable.tech/v1alpha1
kind: HbaseCluster
metadata:
  name: simple-hbase
spec:
  image:
    productVersion: 2.6.3
  clusterConfig:
    hdfsConfigMapName: simple-hdfs
    zookeeperConfigMapName: simple-hbase-znode
  masters:
    roleGroups:
      default:
        replicas: 1
  regionServers:
    roleGroups:
      default:
        config:
          resources:
            cpu:
              min: 300m
              max: "3"
            memory:
              limit: 3Gi
        replicas: 1
  restServers:
    roleGroups:
      default:
        replicas: 1

Verify that it works

To test the cluster, use the REST API to check its version and status, and to create and inspect a new table. Use Phoenix to create, populate and query a second new table, before listing all non-system tables in HBase. These actions wil be carried out from one of the HBase components, the REST server.

First, check the cluster version with this callout:

  kubectl exec -n default simple-hbase-restserver-default-0 -- \
  curl -s -XGET -H "Accept: application/json" "http://simple-hbase-restserver-default-headless:8080/version/cluster"

This returns the version that was specified in the HBase cluster definition:

{"Version":"2.6.3"}

The cluster status can be checked and formatted like this:

kubectl exec -n default simple-hbase-restserver-default-0 \
-- curl -s -XGET -H "Accept: application/json" "http://simple-hbase-restserver-default-headless:8080/status/cluster" | json_pp

which displays cluster metadata that looks like this (only the first region is included for the sake of readability):

{
   "DeadNodes" : [],
   "LiveNodes" : [
      {
         "Region" : [
            {
               "currentCompactedKVs" : 0,
               "memStoreSizeMB" : 1,
               "name" : "aGJhc2U6bWV0YSwsMQ==",
               "readRequestsCount" : 4,
               "rootIndexSizeKB" : 0,
               "storefileIndexSizeKB" : 0,
               "storefileSizeMB" : 0,
               "storefiles" : 0,
               "stores" : 3,
               "totalCompactingKVs" : 0,
               "totalStaticBloomSizeKB" : 0,
               "totalStaticIndexSizeKB" : 0,
               "writeRequestsCount" : 5
            },
            {
               "currentCompactedKVs" : 0,
               "memStoreSizeMB" : 1,
               "name" : "aGJhc2U6bmFtZXNwYWNlLCwxNzUyNDk0MTQzMDQ0LjA1MTA1NWM1NzhhMDQyOWJmZTIwZTFkYTBiY2M4MWE3Lg==",
               "readRequestsCount" : 6,
               "rootIndexSizeKB" : 0,
               "storefileIndexSizeKB" : 0,
               "storefileSizeMB" : 0,
               "storefiles" : 0,
               "stores" : 1,
               "totalCompactingKVs" : 0,
               "totalStaticBloomSizeKB" : 0,
               "totalStaticIndexSizeKB" : 0,
               "writeRequestsCount" : 2
            }
         ],
         "heapSizeMB" : 108,
         "maxHeapSizeMB" : 2458,
         "name" : "simple-hbase-regionserver-default-0-listener.default.svc.cluster.local:16020",
         "requests" : 16,
         "startCode" : 1752494125463
      }
   ],
   "averageLoad" : 2,
   "regions" : 2,
   "requests" : 17
}

You can now create a table like this:

kubectl exec -n default simple-hbase-restserver-default-0 \
-- curl -s -XPUT -H "Accept: text/xml" -H "Content-Type: text/xml" \
"http://simple-hbase-restserver-default-headless:8080/users/schema" \
-d '<TableSchema name="users"><ColumnSchema name="cf" /></TableSchema>'

This creates a table users with a single column family cf. Its creation can be verified by listing it:

kubectl exec -n default simple-hbase-restserver-default-0 \
-- curl -s -XGET -H "Accept: application/json" "http://simple-hbase-restserver-default-headless:8080/users/schema" | json_pp

{
   "table" : [
      {
         "name" : "users"
      }
   ]
}

An alternative way to interact with HBase is to use the Phoenix library that is pre-installed on the Stackable HBase image (in the /stackable/phoenix directory). Use the Python utility psql.py (found in /stackable/phoenix/bin) to create, populate and query a table called WEB_STAT:

kubectl exec -n default simple-hbase-restserver-default-0 -- \
/stackable/phoenix/bin/psql.py \
/stackable/phoenix/examples/WEB_STAT.sql \
/stackable/phoenix/examples/WEB_STAT.csv \
/stackable/phoenix/examples/WEB_STAT_QUERIES.sql

The final command displays some grouped data like this:

HO                    TOTAL_ACTIVE_VISITORS
-- ----------------------------------------
EU                                      150
NA                                        1
Time: 0.017 sec(s)

Check the tables again with:

kubectl exec -n default simple-hbase-restserver-default-0 \
-- curl -s -XGET -H "Accept: application/json" "http://simple-hbase-restserver-default-headless:8080/users/schema" | json_pp

This time the list includes not just users (created above with the REST API) and WEB_STAT, but several other tables too:

{
   "table" : [
      {
         "name" : "SYSTEM.CATALOG"
      },
      {
         "name" : "SYSTEM.CHILD_LINK"
      },
      {
         "name" : "SYSTEM.FUNCTION"
      },
      {
         "name" : "SYSTEM.LOG"
      },
      {
         "name" : "SYSTEM.MUTEX"
      },
      {
         "name" : "SYSTEM.SEQUENCE"
      },
      {
         "name" : "SYSTEM.STATS"
      },
      {
         "name" : "SYSTEM.TASK"
      },
      {
         "name" : "WEB_STAT"
      },
      {
         "name" : "users"
      }
   ]
}

This is because Phoenix requires these SYSTEM. tables for its own internal mapping mechanism, and they are created the first time that Phoenix is used on the cluster.

What’s next

Look at the Usage guide to find out more about configuring your HBase cluster.