[NIFI-12130] PutIceberg: Ability to configure snapshot properties via dynamic attributes - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.0.0-M1, 1.24.0
Component/s: Extensions
Labels:
- iceberg

Description

Motivation

Spark's implementation of Iceberg allows users to add snapshot properties, when writing data to an Iceberg table, using properties prefixed with "snapshot-property." like so:

df.write
.option("write-format", "avro")
.option("snapshot-property.key", "value")
.insertInto("catalog.db.table")

https://iceberg.apache.org/docs/latest/spark-configuration/#write-options

These properties can be used to add context to Iceberg snapshots and help users locate snapshots in recovery scenarios.

In fact, Spark automatically adds the application name as spark.app.id.

Examples of when these properties might be useful include:

Recording the data source used to produce the new records
UUID of flow file used to update the table so it can be matched to NiFi provenance

They can be queried from the snapshots metatable (feature of Iceberg).

Feature request

It would be great if we could configure PutIceberg to add these properties in a similar fashion (e.g. using dynamic properties of the form snapshot-property.*). Continuing with the comparison to Spark, it may also be worth automatically adding the flowfile UUID as something like nifi.flowfile.id.

Further details

I'm not entirely clued up on the Iceberg API, but it looks like these are set on the SnapshotUpdate (AppendFiles inherits from this class):

https://iceberg.apache.org/javadoc/master/org/apache/iceberg/SnapshotUpdate.html

Attachments

Issue Links

links to

GitHub Pull Request #7849

Activity

People

Assignee:: Mark Bathori

Reporter:: William Dyson

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 26/Sep/23 10:47

Updated:: 26/Oct/23 23:58

Resolved:: 26/Oct/23 23:58

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

40m