[DRILL-7064] Leverage the summary's totalRowCount and totalNullCount for COUNT() queries (also prevent eager expansion of files) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.16.0
Component/s: Metadata
Labels:
- ready-to-commit

Description

This sub-task is meant to leverage the Parquet metadata cache's summary stats: totalRowCount (across all files and row groups) and the per-column totalNullCount (across all files and row groups) to answer plain COUNT aggregation queries without Group-By. These are currently converted to a DirectScan by the ConvertCountToDirectScanRule which utilizes the row group metadata; however this rule is applied on Drill Logical rels and converts the logical plan to a physical plan with DirectScanPrel but this is too late since the DrillScanRel that is already created during logical planning has already read the entire metadata cache file along with its full list of row group entries. The metadata cache file can grow quite large and this does not scale.

The solution is to use the Metadata Summary file that is created in ~~DRILL-7063~~ and create a new rule that will apply early on such that it operates on the Calcite logical rels instead of the Drill logical rels and prevents eager expansion of the list of files/row groups.

We will not remove the existing rule. The existing rule will continue to operate as before because it is possible that after some transformations, we still want to apply the optimizations for COUNT queries.

Attachments

Issue Links

depends upon

DRILL-7063 Create separate summary file for schema, totalRowCount, totalNullCount (includes maintenance)

Resolved

is related to

DRILL-3846 Metadata Caching : A count(*) query took more time with the cache in place

Resolved

links to

GitHub Pull Request #1736

Activity

People

Assignee:: Aman Sinha

Reporter:: Venkata Jyothsna Donapati

Reviewer:: Vova Vysotskyi

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 27/Feb/19 23:35

Updated:: 10/Apr/19 01:30

Resolved:: 10/Apr/19 01:30

Time Tracking

Estimated:

336h

Remaining:

336h

Logged:

Not Specified