Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
Impala 2.9.0
-
ghx-label-9
Description
Metadata loading for tables with lots of partitions can be fairly slow special on S3 and ADLS, the operation is fairly latency driven so multiple threads should help speedup the process.
Listing files from multiple partitions in parallel should provide well speedup specially for S3 and ADLS where latencies are usually higher than HDFS.
HdfsTable.loadPartitionFileMetadata(StorageDescriptor,-HdfsPartition) might be a good starting point.
Stack-Trace | Count | Percentage(%) | Total |
---|---|---|---|
com.amazonaws.services.s3.AmazonS3Client.listObjects(ListObjectsRequest) | 4,340 | 75.649 | 83,489,694,712 |
---org.apache.hadoop.fs.s3a.S3AFileSystem.listObjects(ListObjectsRequest) | 4,340 | 75.649 | 83,489,694,712 |
------org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(Path,-String,-Set) | 3,256 | 56.754 | 63,540,096,016 |
---------org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(Path,-boolean) | 3,256 | 56.754 | 63,540,096,016 |
------------org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(Path) | 3,256 | 56.754 | 63,540,096,016 |
---------------org.apache.hadoop.fs.FileSystem.exists(Path) | 2,178 | 37.964 | 45,375,122,798 |
------------------org.apache.hadoop.fs.s3a.S3AFileSystem.exists(Path) | 2,178 | 37.964 | 45,375,122,798 |
---------------------org.apache.impala.catalog.HdfsTable.loadPartitionFileMetadata(StorageDescriptor,-HdfsPartition) | 1,082 | 18.86 | 23,383,160,065 |
------------------------org.apache.impala.catalog.HdfsTable.loadPartitionFileMetadata(List) | 1,082 | 18.86 | 23,383,160,065 |
---------------------------org.apache.impala.catalog.HdfsTable.updatePartitionsFromHms(IMetaStoreClient,-Set,-boolean) | 1,082 | 18.86 | 23,383,160,065 |
------------------------------org.apache.impala.catalog.HdfsTable.load(boolean,-IMetaStoreClient,-Table,-boolean,-boolean,-Set) | 1,082 | 18.86 | 23,383,160,065 |
---------------------------------org.apache.impala.catalog.HdfsTable.load(boolean,-IMetaStoreClient,-Table) | 1,082 | 18.86 | 23,383,160,065 |
---------------------org.apache.impala.catalog.HdfsTable.refreshFileMetadata(HdfsPartition) | 1,096 | 19.104 | 21,991,962,733 |
------------------------org.apache.impala.catalog.HdfsTable.loadPartitionFileMetadata(StorageDescriptor,-HdfsPartition) | 1,096 | 19.104 | 21,991,962,733 |
---------------------------org.apache.impala.catalog.HdfsTable.loadPartitionFileMetadata(List) | 1,096 | 19.104 | 21,991,962,733 |
------------------------------org.apache.impala.catalog.HdfsTable.updatePartitionsFromHms(IMetaStoreClient,-Set,-boolean) | 1,096 | 19.104 | 21,991,962,733 |
--------------org.apache.hadoop.fs.s3a.S3AFileSystem.innerListFiles(Path,-boolean,-Listing$FileStatusAcceptor) | 1,078 | 18.79 | 18,164,973,218 |
------------------org.apache.hadoop.fs.s3a.S3AFileSystem.listFiles(Path,-boolean) | 1,078 | 18.79 | 18,164,973,218 |
---------------------org.apache.impala.catalog.HdfsTable.synthesizeBlockMetadata(FileSystem,-Path,-HashMap) | 1,078 | 18.79 | 18,164,973,218 |
------------------------org.apache.impala.catalog.HdfsTable.synthesizeBlockMetadata(FileSystem,-HdfsPartition) | 1,078 | 18.79 | 18,164,973,218 |
---------------------------org.apache.impala.catalog.HdfsTable.refreshFileMetadata(HdfsPartition) | 1,078 | 18.79 | 18,164,973,218 |
------------------------------org.apache.impala.catalog.HdfsTable.loadPartitionFileMetadata(StorageDescriptor,-HdfsPartition) | 1,078 | 18.79 | 18,164,973,218 |
---------------------------------org.apache.impala.catalog.HdfsTable.loadPartitionFileMetadata(List) | 1,078 | 18.79 | 18,164,973,218 |
------org.apache.hadoop.fs.s3a.Listing$ObjectListingIterator.<init>(Listing,-Path,-ListObjectsRequest) | 1,084 | 18.895 | 19,949,598,696 |
---------org.apache.hadoop.fs.s3a.Listing.createFileStatusListingIterator(Path,-ListObjectsRequest,-PathFilter,-Listing$FileStatusAcceptor,-RemoteIterator) | 1,084 | 18.895 | 19,949,598,696 |
------------org.apache.hadoop.fs.s3a.S3AFileSystem.innerListFiles(Path,-boolean,-Listing$FileStatusAcceptor) | 1,084 | 18.895 | 19,949,598,696 |
---------------org.apache.hadoop.fs.s3a.S3AFileSystem.listFiles(Path,-boolean) | 1,084 | 18.895 | 19,949,598,696 |
------------------org.apache.impala.catalog.HdfsTable.synthesizeBlockMetadata(FileSystem,-Path,-HashMap) | 1,084 | 18.895 | 19,949,598,696 |
--------------------org.apache.impala.catalog.HdfsTable.synthesizeBlockMetadata(FileSystem,-HdfsPartition) | 1,084 | 18.895 | 19,949,598,696 |
Attachments
Attachments
Issue Links
- depends upon
-
IMPALA-4847 Simplify the code for file/block metadata loading by manually calling listLocatedStatus() for each partition.
- Resolved
- relates to
-
IMPALA-5431 Calling FileSystem.Exists() twice in a row for the same partition adds unnecessary latency to metadata loading
- Resolved
-
IMPALA-6112 Improve the thread pool size detection logic while loading partitioned table block metadata
- Open
-
IMPALA-6115 Investigate scheduling options for block metadata loading threads
- Open