Details
-
New Feature
-
Status: In Progress
-
Major
-
Resolution: Unresolved
-
3.0.0
-
None
-
None
Description
Background
Default constraint with column is ANSI standard.
Hive 3.0+ has supported default constraint ref:https://issues.apache.org/jira/browse/HIVE-18726
But Spark SQL implement this feature not yet.
Design
Hive is widely used in production environments and is the standard in the field of big data in fact.
But Hive exists many version used in production and the feature between each version are different.
Spark SQL need to implement default constraint, but there are three points to pay attention to in design:
First, Spark SQL should reduce coupling with Hive.
Second, default constraint could compatible with different versions of Hive.
Thrid, Which expression of default constraint should Spark SQL support? I think should support `literal`, `current_date()`, `current_timestamp()`. Maybe other expression should also supported, like `Cast(1 as float)`, `1 + 2` and so on.
We want to save the metadata of default constraint into properties of Hive table, and then we restore metadata from the properties after client gets newest metadata.The implement is the same as other metadata (e.g. partition,bucket,statistics).
Because default constraint is part of column, so I think could reuse the metadata of StructField. The default constraint will cached by metadata of StructField.
Tasks
This is a big work, wo I want to split this work into some sub tasks, as follows: