Exporting a snapshot to S3

HBase snapshots can be exported with the command hbase snapshot export. To be able to export to S3, the AWS libraries from Hadoop must be on the classpath. These libraries are contained in the HBase image at /stackable/hadoop/share/hadoop/tools/lib/. The script export-snapshot-to-s3 facilitates the export, by providing the associated classpath, extending the Hadoop configuration with the S3 configuration, and calling hbase snapshot export. The S3 configuration is read from environment variables. The script can be directly called on the HBase master container:

$ export-snapshot-to-s3  --help
Options:
    --snapshot <arg>       Snapshot to restore.
    --copy-to <arg>        Remote destination hdfs://
    --copy-from <arg>      Input folder hdfs:// (default hbase.rootdir)
    --target <arg>         Target name for the snapshot.
    --no-checksum-verify   Do not verify checksum, use name+length only.
    --no-target-verify     Do not verify the integrity of the exported snapshot.
    --no-source-verify     Do not verify the source of the snapshot.
    --overwrite            Rewrite the snapshot manifest if already exists.
    --chuser <arg>         Change the owner of the files to the specified one.
    --chgroup <arg>        Change the group of the files to the specified one.
    --chmod <arg>          Change the permission of the files to the specified one.
    --mappers <arg>        Number of mappers to use during the copy (mapreduce.job.maps).
    --bandwidth <arg>      Limit bandwidth to this value in MB/second.

$ export \
    AWS_ACCESS_KEY_ID=myS3AccessKeyId \
    AWS_SECRET_KEY=myS3SecretKey \
    AWS_ENDPOINT=https://s3endpoint:9000/ \
    AWS_SSL_ENABLED=true \
    AWS_PATH_STYLE_ACCESS=true
$ export-snapshot-to-s3 \
    --snapshot my-snapshot \
    --copy-to s3a://my-bucket/my-snapshot

Snapshots can also be imported from S3 into HDFS:

$ export-snapshot-to-s3 \
    --snapshot snap \
    --copy-from s3a://my-bucket/my-snapshot \
    --copy-to hdfs://simple-hdfs/hbase

However, the better approach is to create a Job:

---
apiVersion: batch/v1
kind: Job
metadata:
  name: export-hbase-snapshot
spec:
  template:
    spec:
      containers:
      - name: hbase
        image: docker.stackable.tech/stackable/hbase:2.4.18-stackable24.11.0
        volumeMounts:
        - name: hbase-config
          mountPath: /stackable/conf
        env:
        - name: HBASE_CONF_DIR
          value: /stackable/conf
        - name: HADOOP_CONF_DIR
          value: /stackable/conf
        - name: AWS_ENDPOINT
          value: https://s3endpoint:9000/
        - name: AWS_SSL_ENABLED
          value: true
        - name: AWS_PATH_STYLE_ACCESS
          value: true
        - name: AWS_ACCESS_KEY_ID
          valueFrom:
            secretKeyRef:
              name: s3-credentials
              key: access-key-id
        - name: AWS_SECRET_KEY
          valueFrom:
            secretKeyRef:
              name: s3-credentials
              key: secret-key
        command:
        - export-snapshot-to-s3
        args:
        - --snapshot
        - my-snapshot
        - --copy-to
        - s3a://hbase/my-snapshot
      volumes:
      - name: hbase-config
        projected:
          sources:
          - configMap:
              name: simple-hdfs
          - configMap:
              name: simple-hbase-master-default
      restartPolicy: Never
---
apiVersion: v1
kind: Secret
metadata:
  name: s3-credentials
data:
  access-key-id: myS3AccessKeyId
  secret-key: myS3SecretKey