Data storage backends
You can operate the Hive metastore service (HMS) without S3 or HDFS. Its whole purpose is to store metadata such as "Table foo has columns a, b and c and is stored as parquet in local://tmp/hive/foo".
However, as soon as you start storing metadata in the HMS that refers to a s3a://
or hdfs://
locations, HMS will actually do some operations on the filesystem. This can be e.g. checking if the table location exists, creating it in case it is missing.
So if you are storing tables in S3 (or HDFS for that matter), you need to give the HMS access to that filesystem as well. The Stackable Operator currently supports S3 and HFS.
S3 support
HMS supports creating tables in S3 compatible object stores.
To use this feature you need to provide connection details for the object store using the S3Connection in the top level clusterConfig
.
An example usage can look like this:
clusterConfig:
s3:
inline:
host: minio
port: 9000
accessStyle: Path
credentials:
secretClass: simple-hive-s3-secret-class
Apache HDFS support
As well as S3, HMS also supports creating tables in HDFS.
You can add the HDFS connection in the top level clusterConfig
as follows:
clusterConfig:
hdfs:
configMap: my-hdfs-cluster # Name of the HdfsCluster
Read about the Stackable Operator for Apache HDFS to learn more about setting up HDFS.