Writing to Iceberg tables

Apache Iceberg is a high-performance format for huge analytic tables. Iceberg brings the reliability and simplicity of SQL tables to big data, while making it possible for engines like Spark, Trino, Flink, Presto, Hive and Impala to safely work with the same tables, at the same time.

NiFi supports a PutIceberg processor to add rows to an existing Iceberg table starting from version 1.19.0. As of NiFi version 2.7.2 only PutIceberg is supported, you need to create and compact your tables with other tools such as Trino or Spark (both included in the Stackable Data Platform).

NiFi 2.7 and above

In NiFi 2.7.0 Iceberg support was re-added after the removal in 2.0.0.

The 2.7.x version has the following changes over the 2.0.x and 2.6.x version, you need to adopt your setup accordingly:

  • HDFS and Kerberos support was dropped

  • Hive metastore support was dropped

  • Iceberg REST catalog support was added

  • It now uses the Iceberg S3 IO instead of the Hadoop S3 client libraries

  • It uses much less dependencies and therefore reduces the amount of CVEs

There have been efforts from Stackable to re-add at least Hive metastore support, but we ran into NiFi classpath loader issues, which we haven’t been able to solve so far.

NiFi 2.0 - 2.6

In NiFi 2.0.0 Iceberg support has been removed from upstream NiFi.

We forked the nifi-iceberg-bundle and made it available at https://github.com/stackabletech/nifi-iceberg-bundle. Starting with SDP 25.7, we have added the necessary bundle to NiFi by default, you don’t need to explicitly add Iceberg support to the Stackable NiFi.

Please read on its documentation on how to ingest data into Iceberg tables. You don’t need any special configs on the NiFiCluster in case you are using S3 and no Kerberos.

HDFS and Kerberos are also supported, please have a look at the Iceberg integration test for that.