Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
ghx-label-2
Description
UPDATE statement is now supported for Iceberg tables in Impala.
The implementation creates the delete file(s) and the new data file(s) for the updated row(s). These files are committed in one Iceberg transaction, but the transaction adds two snapshots to the table. The first contains the delete file(s), the second adds the new data file(s) of the updated row(s).
This results in an unusual table history, because the first - temporary - snapshot of the transaction will have no time information associated to it (the table will spend 0 time in that state), and it will not appear as a separate entry when we query table history. Therefore it cannot be queried with time travel based on system time. However, it will appear in the history as the parent of the current snapshot, and it can be queried based on snapshot id, which will give results of an invalid table state.
Impala should create only 1 new snapshot per UPDATE statement, so that the parent of the current snapshot points to the previous valid table state.