Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 3.4.0
-
None
-
ghx-label-13
Description
Query 1 below uses 'casttobigint()' in the IS NOT NULL predicate and its selectivity is computed as the default 10% of the input rows, resulting in cardinality = 7.3K. The predicate in Query 2 with 'CAST' expr computes the correct cardinality of 73.05K.
Query 1:
Query: explain select * from date_dim d1, date_dim d2 where d1.d_week_seq = d2.d_week_seq - 52 and casttobigint(d1.d_week_seq) is not null and casttobigint(d2.d_week_seq) is not null | | 00:SCAN HDFS [tpcds.date_dim d1] | | HDFS partitions=1/1 files=1 size=9.84MB | | predicates: casttobigint(d1.d_week_seq) IS NOT NULL | | runtime filters: RF000 -> d1.d_week_seq | | row-size=255B cardinality=7.30K | +-------------------------------------------------------------+
Query 2:
Query: explain select * from date_dim d1, date_dim d2 where d1.d_week_seq = d2.d_week_seq - 52 and cast(d1.d_week_seq as bigint) is not null and cast(d2.d_week_seq as bigint) is not null | 00:SCAN HDFS [tpcds.date_dim d1] | | HDFS partitions=1/1 files=1 size=9.84MB | | predicates: CAST(d1.d_week_seq AS BIGINT) IS NOT NULL | | runtime filters: RF000 -> d1.d_week_seq | | row-size=255B cardinality=73.05K | +-------------------------------------------------------------+
Query 1 should ideally provide the same cardinality as Query 2. Note that I had to comment out the following lines in FunctionCallExpr.java because a user query is not supposed to directly call the builtin cast function. However, for an external frontend module that calls functions in impala-frontend.jar, this is supported and we should make the behavior consistent.
+// if (isBuiltinCastFunction()) { +// throw new AnalysisException(toSql() + +// " is reserved for internal use only. Use 'cast(expr AS type)' instead."); +// }
Attachments
Issue Links
- relates to
-
IMPALA-10615 Cardinality estimates for some scalar functions could be improved
- Open