[HIVE-14423] S3: Fetching partition sizes from FS can be expensive when stats are not available in metastore - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.2.0
Component/s: None
Labels:
- TODOC2.2

Target Version/s:

2.1.0

Description

When partition stats are not available in metastore, it tries to get the file sizes from FS.

e.g

        at org.apache.hadoop.fs.FileSystem.getContentSummary(FileSystem.java:1487)
        at org.apache.hadoop.hive.ql.stats.StatsUtils.getFileSizeForPartitions(StatsUtils.java:598)
        at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:235)
        at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:144)
        at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:132)
        at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:126)
        at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
        at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)

This can be quite expensive in some FS like S3. Especially when table is partitioned (e.g TPC-DS store_sales which has 1000s of partitions), query can spend 1000s of seconds just waiting for these information to be pulled in.

Also, it would be good to remove FS.getContentSummary usage to find out file sizes.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-14423.2.patch
05/Aug/16 11:09
4 kB
Rajesh Balamohan
HIVE-14423.1.patch
04/Aug/16 11:05
5 kB
Rajesh Balamohan

Issue Links

is depended upon by

HADOOP-13525 Optimize uses of FS operations in the ASF analysis frameworks and libraries

Resolved

is related to

HIVE-13925 ETL optimizations

Open

relates to

HIVE-13901 Hivemetastore add partitions can be slow depending on filesystems

Closed

Activity

People

Assignee:: Rajesh Balamohan

Reporter:: Rajesh Balamohan

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 04/Aug/16 08:18

Updated:: 26/Jul/17 03:30

Resolved:: 06/Aug/16 01:00