Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Incomplete
-
2.3.0
-
None
Description
During investigation of CBO and DataSourceV2 we found, that
org.apache.spark.sql.sources.v2.reader.Statistics
misses attribute/column statistics and that
DataSourceV2Relation#computeStats
wraps
org.apache.spark.sql.sources.v2.reader.Statistics
into
org.apache.spark.sql.catalyst.plans.logical.Statistics
without forwarding the optional rowCount if present.
However rowCount and attributeStats are used during CBO e.g. in JoinEstimation and AggregateEstimation.
We propose that:
- org.apache.spark.sql.sources.v2.reader.Statistics mirrors org.apache.spark.sql.catalyst.plans.logical.Statistics
- DataSourceV2Relation forwards all the information to be available during CBO
Attachments
Issue Links
- is related to
-
SPARK-22386 Data Source V2 improvements
- In Progress