Configuration & Environment Overrides

The cluster definition also supports overriding configuration properties and environment variables, either per role or per role group, where the more specific override (role group) has precedence over the less specific one (role).

Overriding certain properties can lead to faulty clusters. In general this means, do not change ports, hostnames or properties related to data dirs, high-availability or security.

Configuration Properties

For a role or role group, at the same level of config, you can specify configOverrides for the following files:

  • hdfs-site.xml

  • core-site.xml

  • hadoop-policy.xml

  • ssl-server.xml

  • ssl-client.xml

  • security.properties

For example, if you want to set additional properties on the namenode servers, adapt the nameNodes section of the cluster resource like so:

nameNodes:
  roleGroups:
    default:
      config: [...]
      configOverrides:
        core-site.xml:
          fs.trash.interval: "5"
        hdfs-site.xml:
          dfs.namenode.num.checkpoints.retained: "3"
      replicas: 2

Just as for the config, it is possible to specify this at role level as well:

nameNodes:
  configOverrides:
    core-site.xml:
      fs.trash.interval: "5"
    hdfs-site.xml:
      dfs.namenode.num.checkpoints.retained: "3"
  roleGroups:
    default:
      config: [...]
      replicas: 2

All override property values must be strings. The properties will be formatted and escaped correctly into the XML file.

For a full list of configuration options we refer to the Apache Hdfs documentation for hdfs-site.xml and core-site.xml

The security.properties file

The security.properties file is used to configure JVM security properties. It is very seldom that users need to tweak any of these, but there is one use-case that stands out, and that users need to be aware of: the JVM DNS cache.

The JVM manages it’s own cache of successfully resolved host names as well as a cache of host names that cannot be resolved. Some products of the Stackable platform are very sensible to the contents of these caches and their performance is heavily affected by them. As of version 3.3.4 HDFS performs poorly if the positive cache is disabled. To cache resolved host names, and thus speeding up Hbase queries you can configure the TTL of entries in the positive cache like this:

  namenodes:
    configOverrides:
      security.properties:
        networkaddress.cache.ttl: "30"
        networkaddress.cache.negative.ttl: "0"
  datanodes:
    configOverrides:
      security.properties:
        networkaddress.cache.ttl: "30"
        networkaddress.cache.negative.ttl: "0"
  journalnodes:
    configOverrides:
      security.properties:
        networkaddress.cache.ttl: "30"
        networkaddress.cache.negative.ttl: "0"
The operator configures DNS caching by default as shown in the example above.

Environment Variables

In a similar fashion, environment variables can be (over)written. For example per role group:

nameNodes:
  roleGroups:
    default:
      config: {}
      envOverrides:
        MY_ENV_VAR: "MY_VALUE"
      replicas: 1

or per role:

nameNodes:
  envOverrides:
    MY_ENV_VAR: "MY_VALUE"
  roleGroups:
    default:
      config: {}
      replicas: 1
Some environment variables will be overriden by the operator and cannot be set manually by the user. These are HADOOP_HOME, HADOOP_CONF_DIR, POD_NAME and ZOOKEEPER.