Details
-
Improvement
-
Status: Closed
-
Blocker
-
Resolution: Done
-
None
-
None
Description
Improve schema reconciliation to make it more flexible in presence of full schema evolution being enabled.
Desired behavior:
- incoming data has missing columns that were already defined in the table –> null values will be injected into missing columns
- incoming data contains new columns not defined yet in the table -> columns will be added to the table schema (incoming dataframe?)
- incoming data has missing columns that are already defined in the table and new columns not yet defined in the table -> new columns will be added to the table schema, missing columns will be injected with null values
No column should be dropped when using hive sync utility when schema reconciliation is enabled.
Related GH issue:
https://github.com/apache/hudi/issues/5873