Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
ghx-label-14
Description
Currently the STRING and BINARY types are not distinguished in most of the backend. In contrast to the frontend, PrimitiveType::TYPE_BINARY is not used there at all, TYPE_STRING being used instead. This is to ensure that everything that works for STRING also works for BINARY. So far only file readers and writers have had to handle them differently, and they have access to ColumnDescriptors which contain AuxColumnType fields that differentiate these two types.
However, only top-level columns have ColumnDescriptors. Adding support or BINARYs within complex types (see IMPALA-11491 and IMPALA-12651) necessitates adding type information about STRING vs BINARY to embedded fields as well.
Using PrimitiveType::TYPE_BINARY would probably be the cleanest solution but it would affect huge parts of the code as TYPE_BINARY would have to be added to hundreds of switch statements and this would be error prone.
Instead, we should introduce a new field in ColumnType: 'is_binary', which is true if the type is a BINARY and false otherwise. We keep using TYPE_STRING as the PrimitiveType of the ColumnType for BINARYs. This way full type information is present in ColumnType but code that does not differentiate between STRING and BINARY will continue to work for BINARY.
With this change, AuxColumnType is no longer needed and should be removed.
Attachments
Issue Links
- is depended upon by
-
IMPALA-11491 Support BINARY nested in complex types in select list
- Resolved
-
IMPALA-12899 Temporary workaround for BINARY in complex types
- Resolved
-
IMPALA-5323 Support Kudu BINARY
- Resolved