Details
-
Improvement
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
None
-
1
Description
as of now, hudi interprets a special column named "_hoodie_is_deleted" and if set to true, the record is considered a delete else an update or an insert. this is not a reserved column as such. For eg, user dataframe can have a column named "_hoodie_is_deleted" whose data type is random string.
Add validations to hudi to ensure that this columns' data type is boolean if present in the df.
excerpt from the user
I'd suggest:
- Possibly dropping the column (as you say if it has little benefits sure). If not, documenting the behaviour somewhere. Alternatively, always include the column, along with the other Hudi metadata fields which are prepended to written schema already.
- If the column is not a boolean:
- Failing hard, as this column is essentially "reserved" for Hudi
- Taking IS NOT NULL as truthy
Attachments
Issue Links
- links to