Stackable Operator for Apache Hive

This is an operator for Kubernetes that can manage Apache Hive metastores.

Currently it only supports running a Hive metastore service (HMS) that stores metadata about schemas and tables.

Due to several reasons we do not feel that running Hive on Kubernetes makes much sense these days. The most obvious reason being that Hive requires YARN as execution framework, which basically tries to do the same job as Kubernetes - i.e. assign resources. For this reason we decided to provide [Trino](https://github.com/stackabletech/trino-operator) as query engine in the Stackable Data Platform instead of Hive - which still uses the Hive Metastore, hence this operator exists. There are multiple tools that can use the HMS:

  • HiveServer2

    • This is the "original" tool using the HMS.

    • It offers an endpoint, where you can submit HiveQL (similar to SQL) queries.

    • It needs a execution engine, e.g. YARN or Spark.

      • This operator does not support running the Hive server because of the complexity needed to operate YARN on Kubernetes. YARN is a resource manager which is not meant to be running on Kubernetes as Kubernetes already manages its own resources.

      • We offer Trino as a (often times drop-in) replacement (see below)

  • Trino

    • Takes SQL queries and executes them against the tables, whose metadata are stored in HMS.

    • It should offer all the capabilities Hive offers including a lot of additional functionality, such as connections to other data sources.

  • Spark

    • Takes SQL or programmatic jobs and executes them against the tables, whose metadata are stored in HMS.

  • And others

This operator only works with images from the Stackable repository

Supported Versions

The Stackable Operator for Apache Hive currently supports the following versions of Hive:

  • 2.3.9

  • 3.1.3

Docker

docker pull docker.stackable.tech/stackable/hive:<version>