[DRILL-3846] Metadata Caching : A count(*) query took more time with the cache in place - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.16.0
Component/s: Metadata
Labels:
None

Description

git.commit.id.abbrev=3c89b30

I have a folder with 10k complex files. The generated cache file is around 486 MB. The below numbers indicate that we regressed in terms of performance when we generated the metadata cache

0: jdbc:drill:zk=10.10.100.190:5181> select count(*) from `complex_sparse_50000files`;
+----------+
|  EXPR$0  |
+----------+
| 1000000  |
+----------+
1 row selected (30.835 seconds)
0: jdbc:drill:zk=10.10.100.190:5181> refresh table metadata `complex_sparse_50000files`;
+-------+---------------------------------------------------------------------+
|  ok   |                               summary                               |
+-------+---------------------------------------------------------------------+
| true  | Successfully updated metadata for table complex_sparse_50000files.  |
+-------+---------------------------------------------------------------------+
1 row selected (10.69 seconds)
0: jdbc:drill:zk=10.10.100.190:5181> select count(*) from `complex_sparse_50000files`;
+----------+
|  EXPR$0  |
+----------+
| 1000000  |
+----------+
1 row selected (47.614 seconds)

Attachments

Issue Links

relates to

DRILL-7064 Leverage the summary's totalRowCount and totalNullCount for COUNT() queries (also prevent eager expansion of files)

Resolved

Activity

People

Assignee:: Aman Sinha

Reporter:: Rahul Kumar Challapalli

Votes:: 1 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 28/Sep/15 21:05

Updated:: 10/Apr/19 01:31

Resolved:: 10/Apr/19 01:31